A Year Working on OpenSearch (2.0)

I’ve now been at Amazon for 3 years, and it has been a year since I joined OpenSearch, a community-driven, open source fork of Elasticsearch and Kibana. Last week we released OpenSearch 2.0. Given that it’s already end of May, it’s time for my first blog post of 2022.

Many things are going really well.

The first question anyone asks me is whether I am writing any code. I’ve had 484 pull requests merged into opensearch-project, out of 573 pull requests raised. (The fact that one out of five was not merged probably means that I don’t know what I am doing about 20% of the time.) There are tons of tiny bookkeeping changes, such as version increments, but there are also several meaty ones, half in opensearch-build. Turns out, continuously releasing two products (OpenSearch and OpenSearch Dashboards) with dozens of plug-ins each, for a big platform matrix (e.g. Linux, RPM/DEB, x64 and arm64), while working on 3 different releases simultaneously (right now 3.0, 2.1 and 1.3.4), along with half a dozen language clients (e.g. Java, JavaScript, Ruby, Go, and Rust) and integration tools (eg. Logstash or Fluentd), is not easy! We ended up writing a manifest-driven build/test/release automation framework in Python to enable a release train. It worked well, and whereas OpenSearch 1.0 took weeks to ship, we were able to cut 3 versions of the product during the log4j 0-day over a little longer than a week-end.

The confusion between Elasticsearch and OpenSearch seems to have been cleared, too. Occasionally, users will ask whether a new feature of Elasticsearch will be available in OpenSearch (you’re welcome to contribute features without looking at any non Apache-licensed code). And while OpenSearch will keep improving a thousand small ways to be a delightful, secure experience for everyone, the future of the fork is decidedly cloud-native.

What’s that all about? I work in the “Search Services” AWS organization, which builds and operates the Amazon OpenSearch Service. The folks that wrote the control plane for that service are very strong cloud engineers, and the scale of the service is remarkable. For example, in 2020 Pinterest was ingesting 1.7TB of data daily, growing to 3TB that year. Since then, data volumes haven’t grown exponentially, they have exploded. Hundreds of terabytes per day is no longer some crazy number in 2022, and you can draw a curve from there into the future. The big question now is not whether OpenSearch can support a few TB of data per day, but what does OpenSearch need to look like to support many hundreds, and how soon. We can no longer scale this monolith horizontally by adding more nodes, thus the future of OpenSearch is decidedly cloud-native. This doesn’t mean you must run it in the cloud, and much less on AWS. Simply put, cloud-native systems allow every aspect of the software to scale independently, the software is readily extensible, and easily multi-tenant. For example, scaling reads can happen independently of scaling writes, and search can be scaled independently from indexing. Plug-ins can run in isolation with clear, safe boundaries and interfaces, and don’t require a cluster restart. Data access is secure.

As an example of a cloud-native evolution, consider Amazon Redshift. Similarly, in OpenSearch we’ve embarked on a journey towards rethinking extensibility, storage, indexing and search. While I have not written much (or any) code in these areas, I’ve spent many hours with various Engineers brainstorming and building an OpenSearch SDK that will help decouple the engine from its extensions, refactoring and scaling storage, starting with segment replication, and much more. Most of these are not my ideas, but I believe I have been able to help folks feel safe making bigger bets, and aiming high, while staying pragmatic, and always writing code.

That said, don’t dismiss me too quickly as merely a cheerleader - I did help debug a customer problem in the managed service, and wrote a unit test in Lucene for the fix authored by a long time Lucene committer and Elasticsearch Engineer.

Outside of code I like to persevere in areas where others would not.

I helped move opensearch-plugin-template-java into the opensearch-project organization, while preserving the original author who doesn’t work for Amazon as an external repo administrator (a first for Amazon open-source), and worked through a process of adding external maintainers to OpenSearch project repos, merged as opensearch-project/.github#59 with 197 comments. This paved the way for our first external maintainer in OpenSearch core. In some ways these changes were hard (you know what I’m talking about if you’ve ever navigated a large organization with senior decision makers that own a significant P&L), and in other ways they were easy, because everyone at AWS wanted this. In practice, someone just needed to do it, removing obstacles one-by-one. I like this work and believe that enabling others always has much bigger impact vs. anything I could accomplish alone.

There are also some challenges.

I often hear that Amazon isn’t contributing enough to open-source, and I prefer to acknowledge that my colleagues and I can do more. So we do. As of today, I counted 191 out of 401 contributors to OpenSearch that don’t work for Amazon, two dozen Amazon contributors to Lucene, etc.

Across my larger organization, and AWS as a whole, open-source is still considered as an “upstream” activity. Engineers working in proprietary software tend to implement solutions in their territory, and then to open-source some parts (they never get enough time to do it). Doing open-source is perceived as, at the very least, a time-consuming “expense”, or at most a “risk”. Neither is actually true. Open-source is cheaper to write, and solves a number of real problems: it eases access to a more diverse group of experts, improves collaboration in code, creates higher quality software when done right, favors longer term product and design thinking, reduces staff attrition, and improves transparency and security. Open-source software, such as the Apache-licensed OpenSearch, powers many businesses and delivers real customer value to anyone who cares to run the software. Some then choose to invest their time and money into development, while retaining the freedom to do whatever they want with the results.

I’m excited for the rest of 2022 and the OpenSearch Roadmap. See you at OpenSearchCon in Seattle September 21!

Daniel Doubrovkine

A Year Working on OpenSearch (2.0) opensearch | aws

A Year Working on OpenSearch (2.0)