Daniel Doubrovkine bio photo

Daniel Doubrovkine

aka dB., @awscloud, former CTO @artsy, +@vestris, NYC

Email Twitter LinkedIn Github Strava
Creative Commons License

Now that we have a staging and a production environment, we want to copy all production data to staging at the same time as we push new code to staging via continuous integration. While our code has to generally be resilient to data schema changes – we use NoSQL MongoDB – we don’t want to be in the business of carrying backward compatibility code for too long. Instead, we create data migrations that can run after the new code has been deployed and kill the backward compatibility parts with a future commit after we’ve made sure the data has been properly converted.

The entire continuous integration and continuous deployment process looks like this.

The interesting part is that we copy the production database to staging before code is deployed so that we can see the results of automated migrations in staging. If things go south, we can make code changes and try again with a clean copy of production data.

Push and Pull

One existing solution is data push and pull implemented here. But looking at the source code, it’s a row-by-row copy! Ouch.

Let’s write a task that will copy one MongoDB database to another using something more efficient.

Reading Heroku-San Configuration

We’re using Heroku-san, so we’ve got a heroku.yml sitting in the config folder with two values for MONGOHQ_URL under staging and production. We’ll load the file with YAML, fetch MONGOHQ_URL and parse it into parts. For those using regular expressions to parse MongoHQ urls, pay attention: everything except the database name is just a regular piece of a URL.

def db_copy_load_config
  YAML.load_file(Rails.root.join("config/heroku.yml")).symbolize_keys
end

def db_copy_config
  @@config_heroku ||= db_copy_load_config
end

def get_mongohq_url(env)
  db_copy_config[env]["config"]["MONGOHQ_URL"]
end

def parse_mongodb_url(url)
  uri = URI.parse(url)
  [uri, uri.path.gsub("/", "")]
end

I *heart* functions that return two values!

Copying Databases

MongoDB has a nifty copyDatabase (or clone) feature described here. It’s incremental, so we must drop tables before calling it. We also have to ensure that we don’t drop system tables, otherwise our database may be rendered inaccessible.

desc "MongoDB database to database copy"
task :copyDatabase, [:from, :to] => :environment do |t, args|
  from, from_db_name = parse_mongodb_url(args[:from])
  to, to_db_name = parse_mongodb_url(args[:to])
  to_conn = Mongo::Connection.new(to.host, to.port)
  to_db = to_conn.db(to_db_name)
  to_db.authenticate(to.user, to.password)
  to_db.collections.select { |c| c.name !~ /system/ }.each do |c|
    c.drop
  end
  to_conn.copy_database(from_db_name, to_db_name, from.host + ":" + from.port.to_s, from.user, from.password)
end

Easy enough. We can call this to copy a production database to staging or to a local instance (for debugging).

namespace :production do
  desc "Copy production data to staging"
  task :to_staging => :environment do
    Rake::Task["db:copy:copyDatabase"].execute({ from: get_mongohq_url(:production), to: get_mongohq_url(:staging) })
  end
  desc "Copy production data to local"
  task :to_local => :environment do
    Rake::Task["db:copy:copyDatabase"].execute({ from: get_mongohq_url(:production), to: "mongodb://localhost/development" })
  end
end

No Admin for you on MongoHQ

If your destination database is on MongoHQ you will get the following error.

Database command 'copydbgetnonce' failed: {"assertion"=>"unauthorized db:admin lock type:1 client:ip",
  "assertionCode"=>10057, "errmsg"=>"db assertion failure", "ok"=>0.0}

This is because copyDatabase requires admin privileges, which MongoHQ doesn’t give co-located users. Too bad - we have to fall back to the silly mongodump and mongorestore. This has two major disadvantages: it requires a local mongo installation and copies a ton of data over the network from MongoHQ, then back to MongoHQ. I hope that either MongoHQ exposes this API one day or there’s a non-admin way to do this with MongoDB [SERVER-2846].

Using Mongo Dump and Restore

Falling back to mongodump and mongorestore is trivial. It hurts to do it, but it does work. Here’s the complete lib/tasks/db_copy.rake.

namespace :db do
 namespace :copy do

  def db_copy_load_config
   YAML.load_file(Rails.root.join("config/heroku.yml")).symbolize_keys
  end

  def db_copy_config
   @@config_heroku ||= db_copy_load_config
  end

  def get_mongohq_url(env)
   db_copy_config[env]["config"]["MONGOHQ_URL"]
  end

  def parse_mongodb_url(url)
   uri = URI.parse(url)
   [uri, uri.path.gsub("/", "")]
  end

  namespace :production do
   desc "Copy production data to staging"
   task :to_staging => :environment do
    Rake::Task["db:copy:copyDatabase"].execute({ from: get_mongohq_url(:production), to: get_mongohq_url(:staging) })
   end
   desc "Copy production data to local"
   task :to_local => :environment do
    Rake::Task["db:copy:copyDatabase"].execute({ from: get_mongohq_url(:production), to: "mongodb://localhost:27017/development" })
   end
  end

  namespace :staging do
   desc "Copy staging data to local"
   task :to_local => :environment do
    Rake::Task["db:copy:copyDatabase"].execute({ from: get_mongohq_url(:staging), to: "mongodb://localhost:27017/development" })
   end
  end

  desc "MongoDB database to database copy"
  task :copyDatabase, [:from, :to] => :environment do |t, args|
   from, from_db_name = parse_mongodb_url(args[:from])
   to, to_db_name = parse_mongodb_url(args[:to])
   # mongodump
   tmp_db_dir = File.join(Dir.tmpdir, 'db/' + from.host + "_" + from.port.to_s)
   tmp_db_name_dir = File.join(tmp_db_dir, from_db_name)
   FileUtils.rm_rf tmp_db_name_dir if File.directory? tmp_db_name_dir
   system "mongodump -h %s:%s -d %s -u %s -p%s -o %s" % [from.host, from.port, from_db_name, from.user, from.password, tmp_db_dir]
   puts "[#{Time.now}] connecting to #{to_db_name} on #{to.host}:#{to.port} as #{to.user}"
   # clear target database
   to_conn = Mongo::Connection.new(to.host, to.port)
   puts "[#{Time.now}] opening #{to_db_name} on #{to.host}:#{to.port}"
   puts "[#{Time.now}] dropping collections in #{to_db_name} on #{to.host}:#{to.port}"
   to_db = to_conn.db(to_db_name)
   to_db.authenticate(to.user, to.password) unless (to.user.nil? || to.user.blank?)
   to_db.collections.select { |c| c.name !~ /system/ }.each do |c|
    puts " [#{Time.now}] dropping #{c.name}"
    c.drop
   end
   # mongorestore
   if to.user.nil?
    system "mongorestore -h %s:%s -d %s %s" % [to.host, to.port, to_db_name, tmp_db_name_dir]
   else
    system "mongorestore -h %s:%s -d %s -u %s -p%s %s" % [to.host, to.port, to_db_name, to.user, to.password, tmp_db_name_dir]
   end
   puts "[#{Time.now}] db:copy complete"
  end

 end
end