A Rake Task for Backing Up a MongoDB Database

Back | s3, rake, mongodb | 3/31/2011 |

It’s time to connect MongoDB with S3 and write a task that backs up a MongoDB database to Amazon S3. This follows a series of articles, so before you read this you might want to check these out.

We’re now reusing two pieces of code in all these tasks (I put them into s3.rake and mongohq.rake with some bug fixes).

mongohq.rake

Given an environment, retrieve its MongoHQ url of a database and parse it from config/heroku.yml. This returns an URL and a database name.

  1. namespace :mongohq do    
  2.     def get_mongohq_url(env = Rails.env)
  3.       @@config ||= YAML.load_file(Rails.root.join("config/heroku.yml")).symbolize_keys
  4.       config_env = @@config[env.to_sym]
  5.       raise "missing '#{env}' section in config/heroku.yml" if config_env.nil?
  6.       config_env["config"]["MONGOHQ_URL"]
  7.     end    
  8.     def parse_mongohq_url(url)
  9.       uri = URI.parse(url)
  10.       [ uri, uri.path.gsub("/", "") ]
  11.     end    
  12. end

s3.rake

We only have one set of S3 keys (we call this a production set). Retrieve those keys from config/heroku.yml and open an S3Interface connection to Amazon S3.

  1. namespace :s3 do    
  2.     def s3i
  3.       @@s3i ||= s3i_open
  4.     end    
  5.     def s3i_config
  6.       @@s3i_config ||= YAML.load_file(Rails.root.join("config/heroku.yml")).symbolize_keys
  7.       s3i_config_env = @@s3i_config[:production]
  8.       raise "missing 'production' section in config/heroku.yml" if s3i_config_env.nil?
  9.       s3i_config_env['config']
  10.     end    
  11.     def s3i_open
  12.       s3_key_id = s3i_config['S3_ACCESS_KEY_ID']
  13.       s3_access_key = s3i_config['S3_SECRET_ACCESS_KEY']      
  14.       RightAws::S3Interface.new(s3_key_id, s3_access_key, { logger: Rails.logger })
  15.     end    
  16. end

Backup a MongoDB Database

We’re familiar with MongoDB mongodump and mongorestore. The strategy is to create a local backup, compress it and ship it to Amazon S3 into a daily folder that rotates backups. This way we’re going to have a backup every day – the Monday backup, the Tuesday backup, etc. We’ll then copy the latest backup on Amazon itself into a monthly folder to keep forever. This helps us avoid worrying about an ever-growing storage problem as it seems silly to keep years of daily backups.

The complete db_backup.rake code below. It’s an iteration over some code that @sarcilav wrote, so I can’t take all the credit. The bonus feature is to be able to backup any current environment to S3 as well as another environment (eg. backup production from the staging server) remotely.

  1. namespace :db do
  2.   
  3.   namespace :production do
  4.     desc "Back the production MongoDB database to Amazon S3."
  5.     task :backup => :environment do
  6.       Rake::Task["db:backupDatabase"].execute({env: :production})
  7.     end
  8.   end
  9.   
  10.   desc "Backup the current MongoDB database to Amazon S3."
  11.   task :backup => :environment do
  12.     Rake::Task["db:backupDatabase"].execute({env: Rails.env.to_sym})
  13.   end
  14.  
  15.   desc "Backup a MongoDB database to Amazon S3."
  16.   task :backupDatabase, [:env] => :environment do |t, args|
  17.     env = args[:env] || Rails.env
  18.     logger.info("[#{Time.now}] db:backup started (#{env})")
  19.     db, db_name = parse_mongohq_url(get_mongohq_url(env))
  20.     tmp_db_dir = File.join(Dir.tmpdir, 'db/' + db.host + "_" + db.port.to_s)
  21.     logger.info("[#{Time.now}] clearing (#{tmp_db_dir})")
  22.     FileUtils.rm_rf tmp_db_dir if File.directory? tmp_db_dir
  23.     logger.info("[#{Time.now}] mongodump to (#{tmp_db_dir})")
  24.     if (db.user.nil? || db.user.blank?)
  25.       system("mongodump -h #{db.host}:#{db.port} -d #{db_name} -o #{tmp_db_dir}")
  26.     else
  27.       system("mongodump -h #{db.host}:#{db.port} -d #{db_name} -u #{db.user} -p#{db.password} -o #{tmp_db_dir}")
  28.     end
  29.     backup_name = "#{env}-#{db_name}-#{Time.now.strftime('%Y-%m-%d-%H%M%S')}"
  30.     tmp_db_filename = File.join(tmp_db_dir, backup_name)
  31.     logger.info("[#{Time.now}] compressing (#{tmp_db_filename}.tar.gz)")
  32.     system "tar -cvf #{tmp_db_filename}.tar #{tmp_db_dir}/#{db_name}"
  33.     system "gzip #{tmp_db_filename}.tar"
  34.     bucket_name = "#{s3i_config['S3_BUCKET']}"
  35.     tmp_db_filename_tar_gz = tmp_db_filename + ".tar.gz"
  36.     # daily backup
  37.     daily_backup_key = "db/backup/daily/" + Time.now.strftime("%A") + "/" + db_name + ".tar.gz"
  38.     logger.info("[#{Time.now}] uploading (#{tmp_db_filename}) to s3 #{bucket_name}/#{daily_backup_key}")
  39.     s3i.put(bucket_name, daily_backup_key, File.open(tmp_db_filename_tar_gz))
  40.     # monthly backup
  41.     monthly_backup_key = "db/backup/monthly/" + Time.now.strftime("%Y/%B") + "/" + db_name + ".tar.gz"
  42.     logger.info("[#{Time.now}] copying to #{monthly_backup_key}")
  43.     s3i.copy(bucket_name, daily_backup_key, bucket_name, monthly_backup_key)
  44.     logger.info("[#{Time.now}] uploaded #{File.stat(tmp_db_filename_tar_gz).size} byte(s)")
  45.     logger.info("[#{Time.now}] done.")
  46.   end
  47.   
  48. end

Improvements welcome!