Now that we have a staging and a production environment, we want to copy all production data to staging at the same time as we push new code to staging via continuous integration. While our code has to generally be resilient to data schema changes – we use NoSQL MongoDB – we don’t want to be in the business of carrying backward compatibility code for too long. Instead, we create data migrations that can run after the new code has been deployed and kill the backward compatibility parts with a future commit after we’ve made sure the data has been properly converted.
The entire continuous integration and continuous deployment process looks like this.
The interesting part is that we copy the production database to staging before code is deployed so that we can see the results of automated migrations in staging. If things go south, we can make code changes and try again with a clean copy of production data.
Push and Pull
One existing solution is data push and pull implemented here. But looking at the source code, it’s a row-by-row copy! Ouch.
Let’s write a task that will copy one MongoDB database to another using something more efficient.
Reading Heroku-San Configuration
We’re using Heroku-san, so we’ve got a heroku.yml sitting in the config folder with two values for MONGOHQ_URL under staging and production. We’ll load the file with YAML, fetch MONGOHQ_URL and parse it into parts. For those using regular expressions to parse MongoHQ urls, pay attention: everything except the database name is just a regular piece of a URL.
I *heart* functions that return two values!
MongoDB has a nifty copyDatabase (or clone) feature described here. It’s incremental, so we must drop tables before calling it. We also have to ensure that we don’t drop system tables, otherwise our database may be rendered inaccessible.
Easy enough. We can call this to copy a production database to staging or to a local instance (for debugging).
No Admin for you on MongoHQ
If your destination database is on MongoHQ you will get the following error.
This is because copyDatabase requires admin privileges, which MongoHQ doesn’t give co-located users. Too bad - we have to fall back to the silly mongodump and mongorestore. This has two major disadvantages: it requires a local mongo installation and copies a ton of data over the network from MongoHQ, then back to MongoHQ. I hope that either MongoHQ exposes this API one day or there’s a non-admin way to do this with MongoDB [SERVER-2846].
Using Mongo Dump and Restore
Falling back to mongodump and mongorestore is trivial. It hurts to do it, but it does work. Here’s the complete lib/tasks/db_copy.rake.