Pushing Assets to S3 w/ Rake: Versioning and Cache Expiration

Back | javascript, css, git, cloudfront, s3, rails, ruby | 12/10/2011 |

A while ago I wrote about how we package and push Rails assets to Amazon S3. We version assets with the GIT hash – varying the assets by URL enables setting indefinite cache expiration and works well with a CDN. In that post you could find a Rake task that would delete any old assets and replace them with newer assets. It’s time for a revision with some new features.

The first problem we have solved is how long it takes to sync contents between a local folder and S3. The old task fetched the entire bucket file list, which grew quite a bit over time. The S3 API supports a prefix option.

  1. s3i.incrementally_list_bucket(to, prefix: "assets/") do |response|
  2.   response[:contents].each do |existing_object|
  3.     ...
  4.   end
  5. end

The second issue is with asset rollback. We deploy assets to S3 and then code to Heroku. The asset deployment deletes the old assets. There’s a small window in which we have old code and new assets, which is obviously not okay. We’re actually saved by CloudFront which keeps a cache for extended periods of time. A solution is to keep two copies of the assets online: current and previous. The code preserves the most recent copy by looking at the :last_modified field of the S3 object.

Here’s the task with some shortcuts and a complete task as a gist.

  1. # uploads assets to s3 under assets/githash, deletes stale assets
  2. task :uploadToS3, [ :to ] => :environment do |t, args|
  3.   from = File.join(Rails.root, 'public/assets')
  4.   to = args[:to]
  5.   hash = (`git rev-parse --short HEAD` || "").chomp
  6.   
  7.   logger.info("[#{Time.now}] fetching keys from #{to}")
  8.   existing_objects_hash = {}
  9.   existing_assets_hash = {}
  10.   s3i.incrementally_list_bucket(to, prefix: "assets/") do |response|
  11.     response[:contents].each do |existing_object|
  12.       existing_objects_hash[existing_object[:key]] = existing_object
  13.       previous_asset_hash = existing_object[:key].split('/')[1]
  14.       existing_assets_hash[previous_asset_hash] ||= DateTime.parse(existing_object[:last_modified])
  15.     end
  16.   end
  17.  
  18.   logger.info("[#{Time.now}] #{existing_assets_hash.count} existing asset(s)")
  19.   previous_hash = nil
  20.   existing_assets_hash.each_pair do |asset_hash, last_modified|
  21.     logger.info(" #{asset_hash} => #{last_modified}")
  22.     previous_hash = asset_hash unless (previous_hash and existing_assets_hash[previous_hash] > last_modified)
  23.   end
  24.   logger.info("[#{Time.now}] keeping #{previous_hash}") if previous_hash
  25.  
  26.   logger.info("[#{Time.now}] copying from #{from} to s3:#{to} @ #{hash}")
  27.   Dir.glob(from + "/**/*").each do |entry|
  28.     next if File::directory?(entry)
  29.     File.open(entry) do |entry_file|
  30.       content_options = {}
  31.       content_options['x-amz-acl'] = 'public-read'
  32.       content_options['content-type'] = MIME::Types.type_for(entry)[0]
  33.       key = 'assets/'
  34.       key += (hash + '/') if hash
  35.       key += entry.slice(from.length + 1, entry.length - from.length - 1)
  36.       existing_objects_hash.delete(key)
  37.       logger.info("[#{Time.now}]  uploading #{key}")
  38.       s3i.put(to, key, entry_file, content_options)
  39.     end
  40.   end
  41.   
  42.   existing_objects_hash.keys.each do |key|
  43.     next if previous_hash and key.start_with?("assets/#{previous_hash}/")
  44.     puts "deleting #{key}"
  45.     s3i.delete(to, key)
  46.   end
  47. end

Since we’re versioning assets with a GIT hash in the URL, another improvement is to set cache expiration to something longer.

  1. content_options['cache-control'] = "public, max-age=#{365*24*60*60}"