A while ago I wrote about how we package and push Rails assets to Amazon S3. We version assets with the GIT hash – varying the assets by URL enables setting indefinite cache expiration and works well with a CDN. In that post you could find a Rake task that would delete any old assets and replace them with newer assets. It’s time for a revision with some new features.
The first problem we have solved is how long it takes to sync contents between a local folder and S3. The old task fetched the entire bucket file list, which grew quite a bit over time. The S3 API supports a prefix option.
s3i.incrementally_list_bucket(to, prefix: "assets/") do |response|
response[:contents].each do |existing_object|
...
end
end
The second issue is with asset rollback. We deploy assets to S3 and then code to Heroku. The asset deployment deletes the old assets. There’s a small window in which we have old code and new assets, which is obviously not okay. We’re actually saved by CloudFront which keeps a cache for extended periods of time. A solution is to keep two copies of the assets online: current and previous. The code preserves the most recent copy by looking at the :last_modified field of the S3 object.
Here’s the task with some shortcuts and a complete task as a gist.
# uploads assets to s3 under assets/githash, deletes stale assets
task :uploadToS3, [:to] => :environment do |t, args|
from = File.join(Rails.root, 'public/assets')
to = args[:to]
hash = (`git rev-parse --short HEAD` || "").chomp
logger.info("[#{Time.now}] fetching keys from #{to}")
existing_objects_hash = {}
existing_assets_hash = {}
s3i.incrementally_list_bucket(to, prefix: "assets/") do |response|
response[:contents].each do |existing_object|
existing_objects_hash[existing_object[:key]] = existing_object
previous_asset_hash = existing_object[:key].split('/')[1]
existing_assets_hash[previous_asset_hash] ||= DateTime.parse(existing_object[:last_modified])
end
end
logger.info("[#{Time.now}] #{existing_assets_hash.count} existing asset(s)")
previous_hash = nil
existing_assets_hash.each_pair do |asset_hash, last_modified|
logger.info(" #{asset_hash} => #{last_modified}")
previous_hash = asset_hash unless (previous_hash and existing_assets_hash[previous_hash] > last_modified)
end
logger.info("[#{Time.now}] keeping #{previous_hash}") if previous_hash
logger.info("[#{Time.now}] copying from #{from} to s3:#{to} @ #{hash}")
Dir.glob(from + "/\*\*/\*").each do |entry|
next if File::directory?(entry)
File.open(entry) do |entry_file|
content_options = {}
content_options['x-amz-acl'] = 'public-read'
content_options['content-type'] = MIME::Types.type_for(entry)[0]
key = 'assets/'
key += (hash + '/') if hash
key += entry.slice(from.length + 1, entry.length - from.length - 1)
existing_objects_hash.delete(key)
logger.info("[#{Time.now}] uploading #{key}")
s3i.put(to, key, entry_file, content_options)
end
end
existing_objects_hash.keys.each do |key|
next if previous_hash and key.start_with?("assets/#{previous_hash}/")
puts "deleting #{key}"
s3i.delete(to, key)
end
end
Since we’re versioning assets with a GIT hash in the URL, another improvement is to set cache expiration to something longer.
content_options['cache-control'] = "public, max-age=#{365\*24\*60\*60}"