Consider two domain models, a Widget and a Gadget.
A long running process runs once a day and pairs Widgets and Gadgets based on some complicated algorithm.
In the example above the collection of WidgetAndGadget has to be destroyed every time before creating new pairs, which leaves it in an incomplete and unusuable state until the operation is finished. Let’s attempt to rewrite this implementation in a more incremental manner.
The code makes a database query per pair and it has to fetch all pairs, then go over the difference to destroy objects. This is terribly inefficient and very problematic for large data sets. Furthermore, during the pairing process new pairs will be inserted before old pairs are destroyed, leaving the entire collection in an inconsistent state, unusable by our application.
We can solve this by creating a new collection every time with the help of mongoid_collection_snapshot. The library takes care of creating a new collection every time, and maintaining a fixed number of snapshots (default is 2).
Create a new snapshot with
WidgetsAndGadgets.create! and access the latest snapshot with
WidgetsAndGadgets.latest. The actual snapshotted data collection is available via
WidgetsAndGadgets.latest.collection_snapshot.find, which is a Moped::Collection.
We can turn this into a first-class Mongoid model, just like the original WidgetAndGadget (currently requires mongoid_collection_snapshot#10).
Instead of accessing a raw Moped::Collection, we get first-class Mongoid documents!
This was a bit tricky to implement. For each collection snapshot we emit a class with a different collection name passed into store_in.
Full code for this article can be found here.
Real World Impact
I spent a day incrementally rewriting snapshot queries inside the Core API project at Artsy. We have about two dozen snapshot classes. This resulted in about half the code to accomplish the same thing, virtually no spec changes. A very clear win.