• Thunder Technologies

Route 53 Revisited

The transition to serverless for our backup product forced us to jettison much of our custom SaaS infrastructure in favor of native AWS services. The snapshot scheduler (cron) became CloudWatch Events; log4j was removed in favor of CloudWatch Logs; heck, we even discarded our entire JavaScript UI in favor of CloudFormation and CloudFront.

This rethinking opened the opportunity to trim the fat even further on our product.

Removing any functionality not deemed absolutely essential would result in an even more robust and low-cost solution than the original SaaS offering.

For example the “big red button” to failover EC2 instances between sites is gone. All it did was power on the duplicate instances in the DR region; why not just have the user power them on through the AWS console directly.

How about the initial provisioning of duplicate EC2 instances of your production workload in the DR region? While end-users can do this manually, it can be a cumbersome to identify and configure the same settings such as the same image, subnet, placement group, etc., especially for multiple instances. We left this in because makes a time-consuming manual process much easier.

What we didn’t try to do is automate the duplication of the entire production infrastructure, such as NAT gateways and the like. While it is possible to do, the staggering combination of settings that any particular user might have would be a fool’s errand for our software to duplicate. Instead, this can be done manually, since it only needs to be done once, and the user obviously knows how to, say, set up a NAT gateway the DR region since it was already done at the production.

Certainly the ongoing replication of snapshots between regions, which is central to our value proposition, must remain. This is precisely the tedious but vital operation that is best left to automation software. If you don’t replicate your data regularly it will be completely lost in an outage at the primary, and no manual process will recover it.

Testing the DR instances regularly by, at a minimum, powering them on and off after each replication job, is also best left up to software. You can’t test too often, so why would you want to spend valuable time doing this yourself manually.

Another feature required careful consideration. The original SaaS product automated the update of Route53 DNS records on a failover, so that network clients would be redirected to the DR region’s public IPs when resolving fully-qualified domain names. While this is a useful feature, it also added significant complexity to the product by requiring addition IAM permissions, front-end configuration, and error handling. Could a user update the necessary DNS records manually on a failover? The answer, my friend, is yes, most likely, so we left it as an exercise to the reader in our new serverless offering.

The result is the sleekest, most robust, and most cost-effective business continuity solution for AWS. It does only what it needs to, let’s you do the rest, and charges a minimum accordingly. Competing expensive solutions arguably might try to do too much. To paraphrase a famous singer: when you got a little, you got only a little that can break.

To find out more try our hands-on demo at or contact us at

8 views0 comments

Recent Posts

See All