Check out this talk recorded at OpenSearchCon 2023, or continue reading.
Most custom functionality in OpenSearch is implemented with plugins. That is, in theory. In practice, much of core functionality is also implemented in plugins. For example, security or k-NN search are both plugins, even though one would reasonably expect a security framework to be part of the core engine (with multiple implementations in plugins), or for k-NN search to be living right next to full text search. Furthermore, some plugins, such repository-s3 that reads and writes snapshots from/to Amazon S3, live in core, whereas one would expect optional functionality to be … optional. The location for plugins is more a consequence of business and organizational decisions than technical. Software architecture really tends to line up to our business structures!
The default distribution of OpenSearch 2.10 ships with 20 plugins, all enabled by default, erasing much of the difference between what’s core vs. what’s a plugin. A vast majority of users install and run the whole thing.
Plugins suffer from 3 major limitations: rigid version compatibility, lack of isolation, and transitive dependency hell. These problems are described in great detail in this blog post, but before we go there, let’s follow another blog post and write a plugin that implements a RESTful API. The complete source code for the plugin is here.
A plugin inherits from Plugin
and our plugin implements ActionPlugin
(a plugin that exposes actions via REST). Our REST handler responds to GET
requests.
Let’s install the plugin, start OpenSearch, and make an HTTP request to the newly added endpoint on the OpenSearch node. The request will be forwarded to the plugin and the REST handler will handle it.
How easy was it to write a plugin? Very easy! But it’s much harder to write a production plugin on top of a 1.4MM LOC OpenSearch core. You will need to master dependency injection, understand OpenSearch runtime thread pools, and the (optional) security framework. Finally, I promise that you will have a very hard time playing nice with other plugins that share the same Java heap, and execute in the same Java Virtual Machine, deployed on every node in a large scale cluster that is actively indexing petabytes of data, or serving thousands of searches per second.
What can we do to help it?
In OpenSearch 2.9 we have introduced a new concept called extensions and shipped an experimental OpenSearch Java SDK. Extensions are full processes, run on a separate JVM and can execute on a separate host.
The code for an extension with its REST handler is almost identical to the one for a plugin. This was done on purpose to help migrations. The complete source code for this extension is here.
Let’s enable the experimental extensions feature in OpenSearch, install and run this extension.
Extensions overcome the major limitations of plugins by being semver compatible (you can run an extension on many versions of OpenSearch without rebuilding it), do not require restarting a cluster to be installed, and are isolated at runtime. Because you can run an extension remotely, you can also right-size the extension node (no need to add memory to every node in the cluster because one plugin occasionally needs it). In the introductory blog post we cut the cost of a 36-node cluster that performed high cardinality anomaly detection by a third using extensions.
Other than reducing costs, what else can we use this technology for?
Python is the language of machine learning. Unlike a plugin, we can also write an extension in Python. The complete source code for the sample below is here and it looks very similar to the Java one.
Let’s enable the experimental extensions feature in OpenSearch, install and run this extension.
How is this even possible? What is the bridge between OpenSearch, implemented in Java, and an extension written in Python?
With most heavy lifting done by Dan Widdis of OSHI fame, we reverse-engineered, then implemented the Elasticsearch/OpenSearch transport protocol in Python, then took the extensions support for the ride. The latter was very easy because extension messages are all implemented using protobuf and you can just compile those to Python with existing tools.
In theory, to quote an Engineer from Aryn, this opens up the entire Python model zoo-space to OpenSearch, including TensorFlow or Pytorch. But it also shows how one could implement an entire OpenSearch node in another language that doesn’t suffer from, for example, GC pauses.
I hope that someone reading this blog post will build a useful extension in Python for OpenSearch. Can you make it happen?