Preview: How to Pull Off an AWS RDS Migration with Zero Downtime

Imagine migrating your AWS RDS databases running in EC2-Classic to a VPC—without taking them down or prohibiting writes for any length of time. What do you do?

Spencer Thomas and his team at ITHAKA faced this very problem, and they came up with a solution that enabled them to move the databases while maintaining continuous usage, thus preventing downtime. On Thursday, Feb. 23 Spencer will share his team’s solution at the AWS Michigan Meetup. If you’re in the area, we hope you’ll join us at 6:45 at the RightBrain Networks office in Ann Arbor. In the meantime, here’s a quick preview of Spencer’s presentation.

In your presentation, you’ll explain how your team migrated AWS RDS databases to a VPC with zero downtime. Why was this necessary?

We started development in AWS in 2012, and built up our infrastructure and services in EC2-Classic. In 2016, we started moving everything into a VPC. This work would provide us with increased network security, network transparency between on-site resources and resources in AWS, and access to new AWS instance types. (And, undoubtedly, other benefits I haven’t listed.)

Our main website, jstor.org, provides services to millions of users around the world, 24 hours a day. We deliver about 2.5M page views per day, and there is no real “quiet time.” In a given day, several thousand of these interactions cause a database to be updated.

When we first looked at the problem of moving our databases into the VPC, we were sure that we would have to schedule a period of at least several hours when no writes would be permitted. That would let us make a new copy of the database into the VPC and be sure that it was identical to the copy in EC2-Classic. As we looked into this idea, we realized that it would entail making changes to a fair number of user interactions and would require significant work for some.

How did you come up with your solution?

After some investigation, we found an AWS guide to migrating databases that looked like it would work for us.

The final piece of the puzzle was determining whether we could sequence the order in which we moved application access in order to make sure that each “reading” application would see all the updates from each “writing” application. For one of the databases, this required briefly shutting down some background tasks, but we were able to sequence the move without any user-visible outages.[pullquote]RSVP to attend the AWS Michigan Meetup on Thurs., Feb. 23 featuring Spencer Thomas and his talk on how to pull off an AWS RDS migration with zero downtime.[/pullquote]

What were a couple of the constraints on your solution?

As mentioned above, we wanted to make the move without externally visible outages of any site functionality. (No “maintenance window.”) An obvious constraint is that we needed to ensure data correctness and (perhaps not so obviously) immediacy – there are some critical user workflows that require that a database change by one application be visible to other applications within seconds. I mentioned a third constraint above – that when separate applications read from and write to the database, we could not have a situation in which a “reading” app cannot see changes from a “writing” app.

What advice do you have for others when encountering a similar problem?

Don’t assume that your first solution is the best one!

Seek out what others have done already and take what will work for you.

Consider carefully your constraints. It may be that some are illusory and some are more important than you thought they were.

And, of course, try it out first in a TEST environment/scenario before you do it to your production data and applications! Run automated tests (you DO have those, right?) after each step to ensure that it doesn’t break anything (even if you’re SURE it won’t).

Save

Save

Save

Save

Save

Save

Save

Save

Save

Save