Migrating My Personal Website to Confluence Cloud, Transformed by AWS

My personal website had primarily been hosted on Squarespace, but I had been a huge fan of Confluence when I first encountered it a year back largely due to the flexibility in organising information. And so I had an idea - why not use Confluence as my personal website as well? And so I started exploring and migrating my content over to Confluence Cloud.

After my content was all migrated onto Confluence Cloud, I began to work on changing the look-and-feel of the pages, and allowing public access without the need to login. First I enabled anonymous access to a single space PUBLIC, and then I tried to see if I could hide certain confluence gadgets (like that ugly navigation sidebar), banners, adding google analytics, and other stuff that a personal website should have.

Nothing came up.

If I had been on a personal hosted instance of Confluence, I might had been able to reskin and retheme the site relatively easy. But I was on Confluence Cloud, and such features were limited.

I thought of deploying my own Confluence on AWS, but the hosting cost could have easily escalated with an EC2 and RDS deployment.

Confluence Marketplace Add-ons

I then looked around the Atlassian marketplace and tried out the following add-ons:

  • Refined Spaces - I could not get this to work.
  • Instant Websites - It cost an additional $25 a month, way beyond my budget, and it did not play well with some macros I was using, like HTML macro that inserted javascripts, and page properties listing macros.
  • Scroll WP Publisher - I came across this, but did not try it out. It might be limiting in similar ways to Instant Websites

As the old saying goes, if you need something done, you had better do it yourself. And so I had an idea.

AWS Cloudfront + Lambda@Edge + Confluence Cloud

AWS Cloudfront + Lambda@Edge + Confluence Cloud

So this was the basic idea. I would continue to host my website on Confluence Cloud. It would gave me high availability and operational stability and performance at an extremely reasonable rate (USD10, try doing that yourself on AWS per month!). Then I would move my domain over to AWS Cloudfront. Cloudfront would use Confluence Cloud as an origin, and I would leverage on Lambda@Edge to transform the response. Sounds great in theory.

AWS Lambda@Edge Design

I would have two Lambda@Edge functions:

  • One would be triggered upon Viewer Request. This would help me do friendly URL mappings or redirections. I wanted to retain existing website links that was shared out or crawled on the internet, so that visitors going to those older links do not find missing pages.
  • The other would be triggered upon Origin Response. This would transform the response, adding additional CSS styles, javascripts to the pages that would be served.

AWS Lambda@Edge Limitations

But reality soon stepped in.

Firstly, Lambda@Edge does not allow you to modify the response before serving to the viewer.

I then tried a workaround. What if I could leverage on the origin request handler for the Lambda@Edge, make a http network request to my origin directly, and return a modified response?

So I tried out a working Lambda function and attached it to Cloudfront under the origin request event, but all I got was a Cloudfront permission error request! There was no information on why it failed.

Lambda@Edge sure felt very rough at this stage. Nothing turned up. No logs. No events. It was hard to troubleshoot the issue.

Update : I just realised that my Lambda functions was configured to run by default without a VPC. Wonder if that was the issue..

AWS Elastic Beanstalk Deployment

I decided to take a step back and look to deploy this last bit of functionality onto an EC2 instance. I did not need a particular high performing instance - all it needed to do was to proxy calls and do response transformation. I opted to go for the smallest and cheapest instance which was a t2.nano that would cost USD$4.32 a month. The response could then be cached by Cloudfront, further reducing the load on the instance.

I did not, however, go to the full raw EC2 setup. Like I said, I wanted to get things up fast, so I used AWS Elastic Beanstalk to deploy my nodejs application, which provisioned the EC2, auto-scaling group, and Elastic Load-balancer, to name a few. The website was taking shape quite well at this point.

An alternative would be to use AWS API Gateway and return a HTML response, but I already had a nodejs implemnentation ready. An AWS API Gateway with Lambda might be a cheaper approach though, given that the free tier comes with 1 million free requests. It was something I could try further down the road.

AWS Certificate Manager - A Simple and Good Option

The site now looked and worked fine on Chrome, but when I tried mobile safari or safari, the site broke. It looked to be an issue serving the website via HTTP. When I tested out with a local nodejs proxy with HTTPS, all worked fine. I needed to get the site on HTTPS.

SSL Certificates were traditionally expensive and tedious to manage. For this round though, I decided to leverage on AWS Certificate Manager, and provisioned a free certificate for my website. Onboarding was really trivial, I just needed to verified that I am the owner of the domain via email. Attaching the certificate to AWS Cloudfront was a snap too. I would strongly recommend everyone to use this, as long as they do not need EV or OV certificates.

AWS Web Application Firewall - Not for the Weak Heart

Shortly a day after putting the site up, Elastic Beanstalk reported that the service health was in degraded mode. I tried to restart it, and an hour shortly later, the same issue happened.

Looking through the logs, I saw numerous suspicious calls by a user-agent named "Mozilla Jorgee". Given that the user-agent was looking for pages that do not exist, numerous error code 500 were returned. This might have triggered the Elastic Beanstalk that the service was unhealthy and brought down the connection. Interestingly auto-scaling never triggered, because the default trigger was never met (network traffic over a specified threshold).

So I started to look at how I could protect the deployment with a web application firewall. The natural default and preferred choice for now would be AWS WAF.

My previous experience had been with F5 Silverline, a managed WAF service. Imagine my dismay when I started to create a new web ACL for my website, and It was ‘naked’ without any rules! I was expecting options to apply common OWASP Top 10 exploits, or ready made rules that I could enable / disable at will!

After investigating further, it turned out that there were a few templates available to provision such common rules, but it required one to provision via Cloud Formation. Subsequent updates would require re-provisioning.

Tedious.

My recommendation at this point would be that unless you had an in-house team who were security and AWS savvy enough, go for a managed WAF service. Sure, being able to automate WAF provisioning or even having WAS rules as code was nice, but you absolutely do not want to be the person to setup WAF rules late in the night when a zero-day vulnerability was discovered.

An additional protection that was employed was to disallow the Elastic Beanstalk application from direct internet access. One could create security groups for the Elastic Beanstalk application to only allow access from CloudFront IP range (but as far as the internet says, that range could change from time to time, and thus considered a brittle approach), or leverage on custom headers passed from CloudFront to the origin, and the origin enforced access only if the custom headers were present. For now, I chose the latter as it was easier to implement.

StatusCake - Monitoring for the Website Availability

AWS offered CloudWatch for infrastructure and service monitoring, but my preference had been to use StatusCake, a simple but extremely effective monitoring tool. With its free tier, you could setup HTTP webpage monitoring at up to every 5 minutes from any location in the world. If the website went down, email (free), sms (paid) or even chat (free) alerts would be triggered. Strongly encourage everyone to give it a try if they needed to monitor their website availability!

Final Architecture

And so this was what I ended up with. More complex than I originally envisioned.

Final Architecture

Cloudflare instead of AWS Cloudfront?

As of Sep 2017, Cloudflare just announced a beta of Service Workers@Edge, which had similar functionality as Lambda@Edge with javascript functions. Based on their examples, it looked like it could transform responses! However, it was not clear if it was a free or paid offering.

Clouldflare also had a free tier for DDoS CDN protection, but customers needed to step up to the paid tier for WAF.

It also offered free SSL certificates.

Perhaps it might be a more viable option in the future.

Constraints and Limitations of Current Implementation

While the site worked well running off Confluence, it did have a few limitations:

  • The website would not be 100% mobile friendly.
  • Existing AWS implementation still cost me some money Confluence Cloud did have RESTful APIs for which one could use to get page content. Perhaps the next experimentation would be to use scheduled events with Lambda to cache those content into S3 and DynamoDB, and implement Lambda functions via API Gateway to serve up dynamic content, with static HTML pages in S3. This should cut the cost down significantly since DynamoDB, S3, and Lambda all offered a certain amount of usage for free tier.

Future Architecture

Or I could even migrate the website to Ghost, a blogging platform that I had always been attracted to. Once I had my fun of building AWS native applications.

Update: The Confluence Cloud approach might not be a good idea after all. At the current implementation, the url were not adhering to Confluence Cloud structure. I had avoided making too much changes like url rewriting in case the javascript or images no longer work. However, this was making Google Analytics very hard to decipher!

Update: If you could, apply tagging to your AWS resources so that you could get a sense of the cost of the deployment. Note also that for Elastic Beanstalk, you would not be able to tag an existing environments. I found that out the hard way! Also, a single WAF rule was costing me USD 1.32 a month!