It’s time to connect MongoDB with S3 and write a task that backs up a MongoDB database to Amazon S3. This follows a series of articles, so before you read this you might want to check these out.
We’re now reusing two pieces of code in all these tasks (I put them into s3.rake and _mongohq.rake _with some bug fixes).
mongohq.rake
Given an environment, retrieve its MongoHQ url of a database and parse it from config/heroku.yml. This returns an URL and a database name.
s3.rake
We only have one set of S3 keys (we call this a production set). Retrieve those keys from config/heroku.yml and open an S3Interface connection to Amazon S3.
Backup a MongoDB Database
We’re familiar with MongoDB mongodump and mongorestore. The strategy is to create a local backup, compress it and ship it to Amazon S3 into a daily folder that rotates backups. This way we’re going to have a backup every day – the Monday backup, the Tuesday backup, etc. We’ll then copy the latest backup on Amazon itself into a monthly folder to keep forever. This helps us avoid worrying about an ever-growing storage problem as it seems silly to keep years of daily backups.
The complete db_backup.rake code below. It’s an iteration over some code that @sarcilav wrote, so I can’t take all the credit. The bonus feature is to be able to backup any current environment to S3 as well as another environment (eg. backup production from the staging server) remotely.