Even More Serverless Dividends
Updated: Sep 23, 2021
A robust backup and recovery plan must be easily testable. Testable in a way that closely simulates a true failure situation and without requiring interrupting production workload. It should also not require significant cost in both resources and time. For example in a previous article I outlined my discomfort with multi-availability zone AWS RDS — spanning a database within a region across AZ’s — not because of the underlying software, but because there is no way to verify that it works short of the absurd premise of shutting down the production AZ.
In this series of articles explaining our transition from SaaS to serverless, the move to AWS Lambda has yielded yet another dividend: simplification of testing the disaster recovery automation process that our new product Thunder for EC2 Serverless provides. In addition to reducing the complexity, cost, and footprint, the serverless model elegantly combats much of the complexity in failover testing that our SaaS product experienced.
Our product automates DR protection of each of your production EC2 instances by periodically snapshotting an instance’s volumes, replicating the snapshot to a remote region, and attaching a volume from the replicated snapshot to an identical instance in the DR region. After each replication job as a test the DR instance is briefly powered on to confirm it is configured properly to start in case of a real failover.
In reality though we are not protecting EC2 instances but the applications running inside of them. An enhanced test would attach to the application running in the instance and issue a transaction, for example a SQL statement to a database, to confirm the application recovered properly.
We originally accomplished this in our SaaS product by providing a set of scripts for certain common applications such as web servers, databases, and ERP systems; a UI for configuring the necessary parameters, and logic in our SaaS solution to open a security group firewall port from our SaaS instance to the DR application. This approach was somewhat cumbersome for many reasons:
security is sometimes difficult to manage if the instance hosting our SaaS solution ran on a different subnet than the DR instance being test, as a port over the public network would need to be opened to our SaaS instance
storing passwords to authenticate with user applications is problematic
offering support for additional applications — and updating existing scripts — was cumbersome as each user hosted their own copy
configuring the parameters for each application required a dedicated screen in the UI and was difficult to update
Now serverless strikes again, addressing these issues in a robust and maintainable way. Thunder for EC2 Serverless offers a stable of application-specific test scripts delivered as Lambda functions from S3 buckets that we host, each deployed configured through their own individual CloudFormation template. This approach addresses the SaaS challenges gracefully:
the test function when deployed is connected to the VPC of the DR instance(s) it will test; the CloudFormation template also creates a dedicated security group for the function, allowing it access to the DR instance securely through the private network
passwords are stored securely in AWS Secrets manager
the function code is hosted on our public S3 buckets, meaning we can easily add new tests or update existing ones at any time, and end users can pull down those updates at their convenience
CloudFormation templates are customized for each supported application, the CloudFormation UI does the heavy lifting of providing the UI rather than us
Below is the “user-interface” for deploying our MySQL deep test, you will of course recognize it as a CloudFormation template:
After each job, the instance is powered on, the function is invoked, and some content is retrieved from the SQL request to confirm the application recovered. Our product’s logs will archive the output in its logs (which as a reminder are stored in AWS CloudWatch and retrieved through an AWS CloudFront signed URL)
A sample of the log is below:
It bears repeating our belief that cross-region failover is superior to other scenarios precisely because it can be tested in a straightforward and non-disruptive manner. Because it requires no cooperation from the primary, there is no risk of impacting production workload at the same time “simulating” the unavailability of the primary. On top of that the serverless infrastructure seamlessly addresses the security and supportability issues of using test scripts to deeply test the applications. A win-win scenario if ever there was one.
We are encouraged by the interest to date in our new serverless approach to disaster recovery for EC2 instances, and if you would like to beta-test our upcoming release please reach out to us at firstname.lastname@example.org