A Well-Architected Home(page)
By Nahuel Oyhanarte, EDRANS Cloud Engineer
At Edrans, we always try to follow best practices and industry standards, both for our client’s projects and for our own internal stuff. These best practices include (but are not limited to):
- automation: everything should be automated and easily deployed or replicated, even in a different environment, with as few as possible manual steps
- CI/CD: goes in hand with the previous one, with a focus on continuous integration, delivery, deployment, and even testing + rollback if necessary
- collaboration: a good product or service needs close, agile, and real collaboration between everyone involved, with the only focus on getting things (well) done
- code: following the IaaC paradigm, everything we do should be easily reproducible, meaning versioning the infra itself and having clearly defined specs
- WAF: should consider the 5 pillars of the AWS Well-Architected Framework
So, this is the story about how we applied those principles at home, while (re-)building our own corporate site: https://edrans.com
Every story starts with an introduction, a context, even a past from where they delude; this shall not be an exception…
What was the problem then?
Our website was aging fast, didn’t match with new UX/UI paradigms nor web tech standards, and couldn’t keep up with our growth. Additionally, as it was built and managed by an external agency, we started to feel a bit constrained regarding updates and modifications… being an agile company working in an agile industry, you can imagine how much agility means for us.
Therefore, the requirements were:
- Building a new website from scratch, leveraging our own team of designers, developers, and sysadmins.
- Regain full control over its design, content, and implementation details, so we can quickly update or modify it in the future.
- Everything should be defined as code, as stated by IaaC.
- Follow the best practices stated before, doing it the way it should be ™.
- Should check all the fancy buzzwords: elastic, self-healing, automated, and highly available.
- Would have different environments (as identical between them as possible) to develop, test, and serve the site.
From day zero we started collaborating between devs and ops (as in a “DevOps” way) so conversations were around to:
- Which tech stack should we use.
- How are we going to architect it?
- What’s the best way for auto-deploying it.
We’ll focus on the “infrastructure” part for now, and leave the “development” part for another edition.
One of the first decisions we made was going for containers, that allow us to have better resource utilization, infra parity between the different environments, and the ability to quickly and automatically deploy (including rollback).
Therefore we needed a way to deploy containers into the cloud, with a minimum maintenance overhead as possible, while keeping costs under control.
At the same time -and as in every critical and well-designed cloud project- scalability and high-availability were crucial considerations to have in mind.
Options here include:
- Manually installing Docker on EC2 instances (cumbersome, prone to errors, lot of manual work).
- Using a managed container service, like ECS or Fargate (relatively cheap, automatic updates and deploys, little maintenance needed).
- Going full length with an orchestrator like Mesos, Swarm, or Kubernetes (a more expensive, a lot of moving parts, hard to operate).
You can imagine already that we decided to go for Fargate, as it removes the need to provision and manage servers, is fast and flexible at scaling in and out, and it’s a serverless service so there is no maintenance burden nor over-provisioning (you only pay for what you use).
For this case we also decided to make one (and only one) Docker image, pushing it to the ECR repo and deploying from there to both envs we have: STG and PRD.
Here the choice was a bit more straightforward: if you have relational databases in AWS you really can’t go wrong by picking RDS (or even better with a supported engine, Aurora).
There are only a few cases when RDS/Aurora may not fit well:
- When you need full control and flexibility over your databases.
- When you have unsupported configurations.
- When you want to use RAID on EBS volumes.
- When you need manual replication.
… or some other very specific use cases.
Overall, the simplicity, scalability, and high availability provided by RDS/Aurora made this election a no-brainer.
For the production environment, we deployed 2 database instances under an Aurora cluster for High Availability, with the replica (or reader) also being used for read-only queries to increase the performance of the application and allow more concurrent requests.
One of the most common ways of improving website performance is to cache everything you can (so it’s served fast) and be as near as possible to the end-user (so, again, you can serve it fast!).
In the cloud world this is usually achieved by using a CDN, a network of distributed servers (sometimes called “edge locations”) around the world that perform both functions: maintain a “hot” copy of the files (especially those static or infrequently updated ones) and serve them from a closer location (in the physical world and until we beat light speed, the distance usually equals latency).
Therefore, having traffic from all over the world, several static resources to deliver on each request, and with information that doesn’t update as fast (we don’t manage stock for example), a CDN like Cloudfront was a natural fit.
- Dynamic: for serving dynamic content directly from the ALB and the Fargate containers behind (with Apache and PHP doing it’s magic behind).
The not-so-fun part of configuring a CDN is dealing with cache optimization and expiration (by properly configuring HTTP headers, TTLs, cookies, and query strings), but that’s for another blog post.
Following the twelve-factor app, the configuration should be environment agnostic and separated from the code. To achieve this we used AWS SSM Parameter Store so we can manage the parameters needed by the application without having to hardcore them inside.
How does this work?
- The values are populated via Terraform and versioned in its .tf files which are of course committed to a git repo (this includes sensitive information encrypted with KMS using sops).
- Fargate is configured to inject these as environment variables into the containers.
- The application is able to configure itself accordingly, consuming the proper information.
It’s worth mentioning that the same docker image is used everywhere (in local, staging, and production), achieving, therefore, a real environment parity (another of the 12 factors), with the only difference being its configuration.
E. Monitoring / Alerting / Logging
We’re always trying to use as many AWS services as possible, and this topic wasn’t the exception.
Therefore, we make heavy use of Cloudwatch integration with Fargate, RDS, Cloudfront, and other services used to make a nice dashboard for the environment, so we can live to monitor the service based on several metrics as:
- Cloudfront metrics
- Fargate metrics
- ALB metrics
- RDS metrics
This allows us to easily know and at a glance what’s happening with the website if the autoscaling is working properly, if there is an issue we have to fix or debug, among other things.
Additionally, some specific alerts are being pushed to Slack and to our mails (via SNS) so we are notified right away when something needs immediate attention.
The final high-level infra design representing the previous decisions would be like this:
The code itself was divided into two different repositories: one for the application code (where mainly developers commit) and another one for the infrastructure code (aimed especially at SRE’s).
As both repositories have different purposes and don’t share the same lifecycle, we made an independent pipeline for each of them, so even though they’re obviously connected, we have an isolated way of deploying both the app and the infra.
The pipeline for deploying or updating the application itself is based on Bitbucket Pipelines, configured to be called directly from the repository. When a change is pushed to either stg or prd branches a new build is triggered, uploading it to the proper ECR repository, and then this new image is automatically deployed by ECS Fargate at the corresponding environment.
- The first step is building the docker image: here it’ll fetch all dependencies (from PHP using Composer and from Node.js using NPM), configure Apache and PHP.
- After that, it will copy the compiled files (CSS and JS) from the docker image to S3 for being served directly through Cloudfront. The same will be done with images (imgs).
- Next, it will push the image to ECR, create a new task definition for ECS, and deploy it to Fargate.
- Finally, either if the pipeline ran successfully or not, it will send a message to a Slack channel, letting us know its status.
Similar to the one from the application, the infra pipeline is based on Bitbucket but being triggered from a different repo; every time a change is pushed to any branch we automatically ran the following tests and checks:
- tfsec: uses static analysis of the terraform templates to spot potential security issues.
- tflint: linter focused on possible errors, best practices, etc.
- fmt: check the style and format of the terraform code.
That was all for this article on how to architect, deploy and operate a corporate website on AWS, following the “DevOps” paradigm and the best practices of the industry… hope you liked it!
Do you want to know more about AWS, IaaC, CI/CD, and Containers?