I’ve built a few iterations of vector search, beginning in 2011 at Artsy, powered by the Art Genome Project. Compared to LLM use-cases today, Artsy is a small, 1200-dimensional sparse vector and semantic search engine. The first attempt at vector search resulted in a brute-force exact k-nearest-neighbor search with data stored in MongoDB, written in Ruby. The second attempt was an approximate nearest-neighbor implementation using LSH, and finally NN-Descent. Around 2017 we migrated to Elasticsearch, and I am speculating the team has moved to OpenSearch by now because it’s open-source.
Things have evolved rapidly with generative AI, so let’s try to index and search some vectors in 2023 in Python, using the simplest of the libraries, usually pure HTTP when available. You can draw your own conclusions of which engines are better and/or easier to use. Working code for this blog post is here.
Chroma is an AI-native open-source embedding database. You can clone Chroma from GitHub and run it locally.
Chroma comes with a Python and JavaScript client, but underneath it uses a fairly straightforward http interface that talks JSON. The following produces the server version number.
You can check whether a collection exists by querying /api/v1/collections/name, but Chroma returns 500s when it doesn’t, so it gets messy. It also seems to allow you to refer to the collection by name and ID, but not in all APIs, so we need the ID anyway. Let’s get it either from collections or from the return value of creating a collection.
Chroma is opinionated in how it likes to receive data with arrays of IDs, embeddings, metadata, etc.
Search is similar. Chroma handles tokenization, embedding, and indexing automatically, but also does support basic vector search with query_embeddings.
Create a table with a k-nn index. Note allow_experimental_annoy_index=1 in the query string that turns on the approximate nearest neighbor index feature.
MongoDB Atlas enables semantic, hybrid, generative search, and supports filtering in the serverless version. Sign up on on their website and create a new database on the free tier.
Connecting to MongoDB Atlas using pymongo is similar to any MongoDB.
Create a collection.
A separate search index is needed for vector search. While indexes are attached to a collection, they also have a lifecycle of their own and take several seconds to come online.
Insert vectors. Note that indexing isn’t immediate, so search results will not be available until the search index catches up.
Use an aggregation to search for vectors.
A working sample that waits for the index to come online is available here.
MyScale
MyScale performs vector search in SQL, and claims to outperform other solutions by using a proprietary algorithm called MSTG. MyScale is built on the open-source ClickHouse, so the code is almost identical, except that one uses VECTOR INDEX values_index values TYPE MSTG.
Sign up on their website for a test cluster, note the username and password. A working sample is available here.
OpenSearch
OpenSearch is a scalable, flexible, and extensible open-source software suite for search, analytics, and observability applications licensed under Apache 2.0. You can use a managed service, such as Amazon OpenSearch, or download and install it locally. I usually do the latter, mostly because it’s trivial, and I can work offline.
Whichever option you choose you get a single endpoint (e.g. “https://localhost:9200”). Locally it uses basic auth and has self-signed SSL certificates, therefore needs verify=False.
We can get a list of existing indexes. This is a data structure with a ton of useful information, but we’ll make a dictionary out of it, and use it to check whether an index exists.
If an index doesn’t exist, we can create one. The syntax enables k-nn vector search, and include so-called property mappings. It will also need to have a fixed number of dimensions for our vectors.
Indexing data can be done document-by-document or via the bulk API, which requires newline-delimited JSON. We start with some data.
You can insert document-by-document.
Or bulk insert, which asks to separate document IDs from document data, so I purposely wrote it in a way that starts with combined vector documents that include IDs, and generates JSON that the bulk API accepts as a transform.
The Pinecone vector database is easy to build high-performance vector search applications with, developer-friendly, fully managed, and scalable without infrastructure hassles.
Conceptually it has indexes (which are really databases, and were probably originally called as such as the API has /databases in it). After signing up to Pinecone you get a regional endpoint and a project ID. These form a controller URI (e.g. https://controller.us-west4-gcp-free.pinecone.io/) for database operations. After you create an index, that gets its own URI that combines the index name (e.g. “my-index”) and a project ID (e.g. https://my-index-c7556fa.svc.us-west4-gcp-free.pinecone.io). It’s not quite serverless, as you do have to reason about pods.
Authentication is performed using a required API key.
We can get a list of existing indexes. This is just a list of names, useful to check whether an index exists.
If an index doesn’t exist, we can create one. It will need to have a fixed number of dimensions for our vectors.
Qdrant is a similarity vector search engine designed for a wide range of applications, including recommendation systems, image search, and natural language processing. It is scalable and allows dynamic updates to the index. It is particularly suitable for scenarios where the vector data is constantly evolving and vectors may be modified without interrupting the search functionality. Qdrant is licensed under Apache 2.0.
Qdrant is built upon a concept of indexes, where vectors are organized and stored in “collections” for quick retrieval. Currently, it only supports HNSW (Hierarchical Navigable Small World) as vector index.
After you sign up at Qdrant Cloud Services, create a new free tier Qdrant Cluster with authentication. Note your cluster URL and API key. The endpoint will have the following format https://my-cluster.cloud.qdrant.io:6333/.
Redis is a fast, opinionated, open-source database. Its similarity vector search comes with FLAT and HNSW indexing methods (field types). Redis is licensed under BSD.
I prefer to run Redis locally in Docker with docker run -p 6379:6379 redislabs/redisearch:latest, but managed service options with free tiers also exist.
Redis speaks RESP, which is not HTTP, hence we’re going to use redis-py.
We create an HNSW index called vectors of documents with a given doc: prefix. This is unlike other databases where you write docs into an index.
Insert some vectors. Note that redis doesn’t support a deep dictionary for metadata, so we will index and filter by genre in search.
Search. We filter by genre with @genre:{ action }). Use ** instead if you don’t want filtering.
Vespa is a fully featured search engine and vector database. It supports approximate nearest neighbor search, lexical search, and search in structured data, all in the same query. Vespa is Apache 2.0 licensed, and can be run in a variety of ways, including Docker and as a managed cloud service.
This container listens on port 8080 for search and ingestion APIs, and on 19071 for configuration APIs.
Vespa encapsulates the concept of a schema/index in an application that needs to be defined and deployed, so it is not as straightforward as the previous example.
To create a new application with a sample vector schema we need to create a settings.xml file with the overall application properties, and a schema.md file with the definition of our schema. For this example, let’s create the following directory structure.
Weaviate is a vector search engine specifically designed for natural language numerical data. It uses contextualized embeddings in data objects to understand semantic similarity. Currently, it supports only Hierarchical Navigable Small World (HNSW) indexing, and is more costly on building data to indexes. However, it has a fast query time and high scalability. Weaviate is open-source, easy to use, flexible, extensible, and has a Contributor License Agreement.
After you sign up at Weaviate Cloud Services WCS, create a new free tier Weaviate Cluster with authentication. Note your cluster URL and API key (optional). The endpoint will have the following format https://myindex.weaviate.network.
It is easy to create some objects with vectors.
The search is pretty straightforward. Weaviate also has a GraphQL interface.
Deleting objects of the same class is straightforward.
I also wonder whether we need a generic client that’s agnostic to which vector DB is being used to help make code portable? I took a stab at a very simple prototype.