There’s a Gem for That Too
In a previous article I described how to process an image in the Carrierwave pipeline to extract it’s geometry. It all works very well until a large amount of people start uploading high resolution images, including 100 megabyte TIFFs, all at once. I’ve noticed that this hogs a Heroku dyno for an extra 1 to 30 seconds depending on the image size. Today we’ll offload this processing into a queue with delayed_job. The latter was originally extracted from Shopify and continued development on this fork. We’ll throw in delayed_job_mongoid, since we’re a MongoDB shop.
Extracting Image Geometry
We’ll use a simple example as our geometry processor. It reads the image and extracts it’s width and height.
Previously we would run this method before saving the image, making our users wait. Now we would like to delay this operation after the image has been saved. With delayed_job this is not quite as simple as calling
.delay.get_geometry. This method is going to execute asynchronously, so it requires more context, such as the model to update. The following variation is something that can actually be delayed.
We’re passing in the id of the model to be updated and the image file that has just been uploaded. In our world we upload images to Amazon S3, so my first implementation would save the file being uploaded into a temporary location, would queue the job and finally delete the temporary file after processing finished. That worked well until I deployed the application to Heroku. Can you spot my beginner mistake?
On Heroku each dyno is an independent entity, a piece of virtual hardware with its own file system, memory and isolated disk space. Each dyno performing an upload would write the file to a temporary folder and the delayed job worker would be failing because it couldn’t find the file. Those two live on separate servers. This is actually a good thing - Heroku forced us to build truly parallel software, disk IO could easily be our next bottleneck.
We will now write a self-contained processor that gets the image url from the image before processing it. We’ll incur the cost of re-downloading the image, but it’s not as important since this is happening in the background.
Let’s trigger the geometry processor from the after_save callback of our image.
Can you spot my second beginner mistake?
This may cause an infinite loop since update_attributes executes callbacks, including after_save. An image will get processed, updated and then scheduled to be processed again after the update. The only thing protecting us is the fragile image.file? that returns true if a file has just been uploaded. In the real world we do a lot more in this callback, so lets suppress callbacks from within the delayed job. My favorite solution is the following code placed in config/initializers/without_callbacks.rb.
Finally, the geometry processor can take some final shape.
To test this in a local environment run
rake jobs:work in parallel to
rails server. On Heroku you will have to add a worker dyno by running
heroku workers 1 as described in http://devcenter.heroku.com/articles/delayed-job.
System administrators will find it convenient to see delayed jobs in the UI. Here’s our view and controller that appear in the admin interface to get you started.