code.dblock.org | tech blogJekyll2024-01-06T14:53:35+00:00https://code.dblock.org/Daniel Doubrovkinehttps://code.dblock.org/dblock@dblock.orghttps://code.dblock.org/2023/10/16/making-raw-json-rest-requests-to-opensearch2023-10-16T00:00:00+00:002023-10-16T00:00:00+00:00Daniel Doubrovkinehttps://code.dblock.orgdblock@dblock.org<p>OpenSearch clients implement various high-level REST DSLs to invoke OpenSearch APIs. Efforts such as <a href="https://github.com/opensearch-project/opensearch-clients/issues/19">opensearch-clients#19</a> aim at generating these from spec in order to always be up-to-date with the default distribution, including plugins. However this is a game that cannot be won. Clients will always lag behind, and users often find themselves in a situation that requires them to invoke an API that is not supported by the client. Thus, in <a href="https://github.com/opensearch-project/opensearch-clients/issues/62">opensearch-clients#62</a> I proposed we level up all OpenSearch language clients in their capability to make raw JSON REST requests. You help on these issues would be very much appreciated.</p>
<p>In this post I’ll keep current state with links to working samples, similar to <a href="/2022/07/11/making-sigv4-authenticated-requests-to-managed-opensearch.html">Making AWS SigV4 Authenticated Requests to Amazon OpenSearch</a>. For all these I am running a local copy of OpenSearch 2.9 in docker.</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash">docker run <span class="se">\</span>
<span class="nt">-p</span> 9200:9200 <span class="se">\</span>
<span class="nt">-p</span> 9600:9600 <span class="se">\</span>
<span class="nt">-e</span> <span class="s2">"discovery.type=single-node"</span> <span class="se">\</span>
opensearchproject/opensearch:latest</code></pre></figure>
<h3 id="command-line">Command Line</h3>
<p>We’ll be looking for the equivalent of the four <code class="language-plaintext highlighter-rouge">GET</code>, <code class="language-plaintext highlighter-rouge">POST</code>, <code class="language-plaintext highlighter-rouge">PUT</code> and <code class="language-plaintext highlighter-rouge">DELETE</code> operations.</p>
<h4 id="curl"><a href="https://curl.se/">curl</a></h4>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash">curl <span class="nt">-k</span> <span class="nt">-u</span> admin:admin https://localhost:9200</code></pre></figure>
<figure class="highlight"><pre><code class="language-json" data-lang="json"><span class="p">{</span><span class="w">
</span><span class="nl">"name"</span><span class="w"> </span><span class="p">:</span><span class="w"> </span><span class="s2">"5d98546c8098"</span><span class="p">,</span><span class="w">
</span><span class="nl">"cluster_name"</span><span class="w"> </span><span class="p">:</span><span class="w"> </span><span class="s2">"docker-cluster"</span><span class="p">,</span><span class="w">
</span><span class="nl">"cluster_uuid"</span><span class="w"> </span><span class="p">:</span><span class="w"> </span><span class="s2">"Hu0dA0iYREiBVPqEuHqYaA"</span><span class="p">,</span><span class="w">
</span><span class="nl">"version"</span><span class="w"> </span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w">
</span><span class="nl">"distribution"</span><span class="w"> </span><span class="p">:</span><span class="w"> </span><span class="s2">"opensearch"</span><span class="p">,</span><span class="w">
</span><span class="nl">"number"</span><span class="w"> </span><span class="p">:</span><span class="w"> </span><span class="s2">"2.9.0"</span><span class="p">,</span><span class="w">
</span><span class="nl">"build_type"</span><span class="w"> </span><span class="p">:</span><span class="w"> </span><span class="s2">"tar"</span><span class="p">,</span><span class="w">
</span><span class="nl">"build_hash"</span><span class="w"> </span><span class="p">:</span><span class="w"> </span><span class="s2">"1164221ee2b8ba3560f0ff492309867beea28433"</span><span class="p">,</span><span class="w">
</span><span class="nl">"build_date"</span><span class="w"> </span><span class="p">:</span><span class="w"> </span><span class="s2">"2023-07-18T21:22:48.164885046Z"</span><span class="p">,</span><span class="w">
</span><span class="nl">"build_snapshot"</span><span class="w"> </span><span class="p">:</span><span class="w"> </span><span class="kc">false</span><span class="p">,</span><span class="w">
</span><span class="nl">"lucene_version"</span><span class="w"> </span><span class="p">:</span><span class="w"> </span><span class="s2">"9.7.0"</span><span class="p">,</span><span class="w">
</span><span class="nl">"minimum_wire_compatibility_version"</span><span class="w"> </span><span class="p">:</span><span class="w"> </span><span class="s2">"7.10.0"</span><span class="p">,</span><span class="w">
</span><span class="nl">"minimum_index_compatibility_version"</span><span class="w"> </span><span class="p">:</span><span class="w"> </span><span class="s2">"7.0.0"</span><span class="w">
</span><span class="p">},</span><span class="w">
</span><span class="nl">"tagline"</span><span class="w"> </span><span class="p">:</span><span class="w"> </span><span class="s2">"The OpenSearch Project: https://opensearch.org/"</span><span class="w">
</span><span class="p">}</span></code></pre></figure>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash">curl <span class="nt">-k</span> <span class="nt">-u</span> admin:admin <span class="se">\</span>
<span class="nt">-X</span> POST <span class="se">\</span>
<span class="nt">-H</span> <span class="s2">"Content-type:application/json"</span> <span class="se">\</span>
<span class="nt">--data</span> <span class="s1">'{"director":"Bennett Miller","title":"Moneyball","year":2011}'</span> <span class="se">\</span>
https://localhost:9200/movies/_doc/1 | jq</code></pre></figure>
<figure class="highlight"><pre><code class="language-json" data-lang="json"><span class="p">{</span><span class="w">
</span><span class="nl">"_index"</span><span class="p">:</span><span class="w"> </span><span class="s2">"movies"</span><span class="p">,</span><span class="w">
</span><span class="nl">"_id"</span><span class="p">:</span><span class="w"> </span><span class="s2">"1"</span><span class="p">,</span><span class="w">
</span><span class="nl">"_version"</span><span class="p">:</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span><span class="w">
</span><span class="nl">"result"</span><span class="p">:</span><span class="w"> </span><span class="s2">"created"</span><span class="p">,</span><span class="w">
</span><span class="nl">"_shards"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w">
</span><span class="nl">"total"</span><span class="p">:</span><span class="w"> </span><span class="mi">2</span><span class="p">,</span><span class="w">
</span><span class="nl">"successful"</span><span class="p">:</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span><span class="w">
</span><span class="nl">"failed"</span><span class="p">:</span><span class="w"> </span><span class="mi">0</span><span class="w">
</span><span class="p">},</span><span class="w">
</span><span class="nl">"_seq_no"</span><span class="p">:</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span><span class="w">
</span><span class="nl">"_primary_term"</span><span class="p">:</span><span class="w"> </span><span class="mi">1</span><span class="w">
</span><span class="p">}</span></code></pre></figure>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash">curl <span class="nt">-k</span> <span class="nt">-u</span> admin:admin <span class="se">\</span>
<span class="nt">-X</span> GET <span class="se">\</span>
https://localhost:9200/movies/_doc/1 | jq</code></pre></figure>
<figure class="highlight"><pre><code class="language-json" data-lang="json"><span class="p">{</span><span class="w">
</span><span class="nl">"_index"</span><span class="p">:</span><span class="w"> </span><span class="s2">"movies"</span><span class="p">,</span><span class="w">
</span><span class="nl">"_id"</span><span class="p">:</span><span class="w"> </span><span class="s2">"1"</span><span class="p">,</span><span class="w">
</span><span class="nl">"_version"</span><span class="p">:</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span><span class="w">
</span><span class="nl">"_seq_no"</span><span class="p">:</span><span class="w"> </span><span class="mi">2</span><span class="p">,</span><span class="w">
</span><span class="nl">"_primary_term"</span><span class="p">:</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span><span class="w">
</span><span class="nl">"found"</span><span class="p">:</span><span class="w"> </span><span class="kc">true</span><span class="p">,</span><span class="w">
</span><span class="nl">"_source"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w">
</span><span class="nl">"director"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Bennett Miller"</span><span class="p">,</span><span class="w">
</span><span class="nl">"title"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Moneyball"</span><span class="p">,</span><span class="w">
</span><span class="nl">"year"</span><span class="p">:</span><span class="w"> </span><span class="mi">2011</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="p">}</span></code></pre></figure>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash">curl <span class="nt">-k</span> <span class="nt">-u</span> admin:admin <span class="se">\</span>
<span class="nt">-X</span> PUT <span class="se">\</span>
<span class="nt">-H</span> <span class="s2">"Content-type:application/json"</span> <span class="se">\</span>
<span class="nt">--data</span> <span class="s1">'{"director":"Bennett Miller","title":"Moneyball","year":2011}'</span> <span class="se">\</span>
https://localhost:9200/movies/_doc/1 | jq</code></pre></figure>
<figure class="highlight"><pre><code class="language-json" data-lang="json"><span class="p">{</span><span class="w">
</span><span class="nl">"_index"</span><span class="p">:</span><span class="w"> </span><span class="s2">"movies"</span><span class="p">,</span><span class="w">
</span><span class="nl">"_id"</span><span class="p">:</span><span class="w"> </span><span class="s2">"1"</span><span class="p">,</span><span class="w">
</span><span class="nl">"_version"</span><span class="p">:</span><span class="w"> </span><span class="mi">3</span><span class="p">,</span><span class="w">
</span><span class="nl">"result"</span><span class="p">:</span><span class="w"> </span><span class="s2">"updated"</span><span class="p">,</span><span class="w">
</span><span class="nl">"_shards"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w">
</span><span class="nl">"total"</span><span class="p">:</span><span class="w"> </span><span class="mi">2</span><span class="p">,</span><span class="w">
</span><span class="nl">"successful"</span><span class="p">:</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span><span class="w">
</span><span class="nl">"failed"</span><span class="p">:</span><span class="w"> </span><span class="mi">0</span><span class="w">
</span><span class="p">},</span><span class="w">
</span><span class="nl">"_seq_no"</span><span class="p">:</span><span class="w"> </span><span class="mi">2</span><span class="p">,</span><span class="w">
</span><span class="nl">"_primary_term"</span><span class="p">:</span><span class="w"> </span><span class="mi">1</span><span class="w">
</span><span class="p">}</span></code></pre></figure>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash">curl <span class="nt">-k</span> <span class="nt">-u</span> admin:admin <span class="se">\</span>
<span class="nt">-X</span> DELETE <span class="se">\</span>
https://localhost:9200/movies/_doc/1 | jq</code></pre></figure>
<figure class="highlight"><pre><code class="language-json" data-lang="json"><span class="p">{</span><span class="w">
</span><span class="nl">"_index"</span><span class="p">:</span><span class="w"> </span><span class="s2">"movies"</span><span class="p">,</span><span class="w">
</span><span class="nl">"_id"</span><span class="p">:</span><span class="w"> </span><span class="s2">"1"</span><span class="p">,</span><span class="w">
</span><span class="nl">"_version"</span><span class="p">:</span><span class="w"> </span><span class="mi">4</span><span class="p">,</span><span class="w">
</span><span class="nl">"result"</span><span class="p">:</span><span class="w"> </span><span class="s2">"deleted"</span><span class="p">,</span><span class="w">
</span><span class="nl">"_shards"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w">
</span><span class="nl">"total"</span><span class="p">:</span><span class="w"> </span><span class="mi">2</span><span class="p">,</span><span class="w">
</span><span class="nl">"successful"</span><span class="p">:</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span><span class="w">
</span><span class="nl">"failed"</span><span class="p">:</span><span class="w"> </span><span class="mi">0</span><span class="w">
</span><span class="p">},</span><span class="w">
</span><span class="nl">"_seq_no"</span><span class="p">:</span><span class="w"> </span><span class="mi">3</span><span class="p">,</span><span class="w">
</span><span class="nl">"_primary_term"</span><span class="p">:</span><span class="w"> </span><span class="mi">1</span><span class="w">
</span><span class="p">}</span></code></pre></figure>
<h3 id="java">Java</h3>
<h4 id="opensearch-java"><a href="https://github.com/opensearch-project/opensearch-java">opensearch-java</a></h4>
<p>Feature request, <a href="https://github.com/opensearch-project/opensearch-java/issues/257">opensearch-java#257</a>.</p>
<h3 id="ruby">Ruby</h3>
<h4 id="opensearch-ruby"><a href="https://github.com/opensearch-project/opensearch-ruby">opensearch-ruby</a></h4>
<p>Feature request, <a href="https://github.com/opensearch-project/opensearch-ruby/issues/209">opensearch-ruby#209</a>. Should also be possible via <code class="language-plaintext highlighter-rouge">client.perform_request</code>.</p>
<h3 id="nodejs">Node.js</h3>
<h4 id="opensearch-js"><a href="https://github.com/opensearch-project/opensearch-js">opensearch-js</a></h4>
<p>Feature request, <a href="https://github.com/opensearch-project/opensearch-js/issues/631">opensearch-js#631</a>. Should also be possible via <code class="language-plaintext highlighter-rouge">transport.request</code>.</p>
<h3 id="python">Python</h3>
<h4 id="opensearch-py"><a href="https://github.com/opensearch-project/opensearch-py">opensearch-py</a></h4>
<p>The Python client exposes <code class="language-plaintext highlighter-rouge">client.transport.perform_request</code>.</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">info</span> <span class="o">=</span> <span class="n">client</span><span class="p">.</span><span class="n">transport</span><span class="p">.</span><span class="n">perform_request</span><span class="p">(</span><span class="s">'GET'</span><span class="p">,</span> <span class="s">'/'</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="sa">f</span><span class="s">"Welcome to </span><span class="si">{</span><span class="n">info</span><span class="p">[</span><span class="s">'version'</span><span class="p">][</span><span class="s">'distribution'</span><span class="p">]</span><span class="si">}</span><span class="s"> </span><span class="si">{</span><span class="n">info</span><span class="p">[</span><span class="s">'version'</span><span class="p">][</span><span class="s">'number'</span><span class="p">]</span><span class="si">}</span><span class="s">!"</span><span class="p">)</span></code></pre></figure>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">document</span> <span class="o">=</span> <span class="p">{</span>
<span class="s">'title'</span><span class="p">:</span> <span class="s">'Moneyball'</span><span class="p">,</span>
<span class="s">'director'</span><span class="p">:</span> <span class="s">'Bennett Miller'</span><span class="p">,</span>
<span class="s">'year'</span><span class="p">:</span> <span class="s">'2011'</span>
<span class="p">}</span>
<span class="n">client</span><span class="p">.</span><span class="n">transport</span><span class="p">.</span><span class="n">perform_request</span><span class="p">(</span><span class="s">"PUT"</span><span class="p">,</span> <span class="s">"/movies/_doc/1?refresh=true"</span><span class="p">,</span> <span class="n">body</span> <span class="o">=</span> <span class="n">document</span><span class="p">)</span></code></pre></figure>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">query</span> <span class="o">=</span> <span class="p">{</span>
<span class="s">'size'</span><span class="p">:</span> <span class="mi">5</span><span class="p">,</span>
<span class="s">'query'</span><span class="p">:</span> <span class="p">{</span>
<span class="s">'multi_match'</span><span class="p">:</span> <span class="p">{</span>
<span class="s">'query'</span><span class="p">:</span> <span class="s">'miller'</span><span class="p">,</span>
<span class="s">'fields'</span><span class="p">:</span> <span class="p">[</span><span class="s">'title^2'</span><span class="p">,</span> <span class="s">'director'</span><span class="p">]</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="n">client</span><span class="p">.</span><span class="n">transport</span><span class="p">.</span><span class="n">perform_request</span><span class="p">(</span><span class="s">"POST"</span><span class="p">,</span> <span class="s">"/movies/_search"</span><span class="p">,</span> <span class="n">body</span> <span class="o">=</span> <span class="n">query</span><span class="p">)</span></code></pre></figure>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">client</span><span class="p">.</span><span class="n">transport</span><span class="p">.</span><span class="n">perform_request</span><span class="p">(</span><span class="s">"DELETE"</span><span class="p">,</span> <span class="s">"/movies"</span><span class="p">)</span></code></pre></figure>
<p>See the <a href="https://github.com/opensearch-project/opensearch-py/blob/main/guides/json.md">updated documentation</a> and <a href="https://github.com/opensearch-project/opensearch-py/tree/main/samples/json">working demo</a> for more information. I also <a href="https://github.com/opensearch-project/opensearch-py/pull/544">made a PR</a> for a higher level DSL.</p>
<h3 id="dotnet">DotNet</h3>
<h4 id="opensearch-net"><a href="https://github.com/opensearch-project/opensearch-net">opensearch-net</a></h4>
<p>Feature request, <a href="https://github.com/opensearch-project/opensearch-net/issues/403">opensearch-net#403</a>.</p>
<h3 id="rust">Rust</h3>
<h4 id="opensearch-rs"><a href="https://docs.rs/opensearch/latest/opensearch/">opensearch-rs</a></h4>
<p>The rust client directly supports <code class="language-plaintext highlighter-rouge">JsonBody<_></code> on request, and <code class="language-plaintext highlighter-rouge">.json()</code> on response.</p>
<figure class="highlight"><pre><code class="language-rust" data-lang="rust"><span class="k">let</span> <span class="n">info</span><span class="p">:</span> <span class="n">Value</span> <span class="o">=</span> <span class="n">client</span>
<span class="py">.send</span><span class="p">::</span><span class="o"><</span><span class="p">(),</span> <span class="p">()</span><span class="o">></span><span class="p">(</span><span class="nn">Method</span><span class="p">::</span><span class="n">Get</span><span class="p">,</span> <span class="s">"/"</span><span class="p">,</span> <span class="nn">HeaderMap</span><span class="p">::</span><span class="nf">new</span><span class="p">(),</span> <span class="nb">None</span><span class="p">,</span> <span class="nb">None</span><span class="p">,</span> <span class="nb">None</span><span class="p">)</span>
<span class="k">.await</span><span class="o">?</span>
<span class="nf">.json</span><span class="p">()</span>
<span class="k">.await</span><span class="o">?</span><span class="p">;</span>
<span class="nd">println!</span><span class="p">(</span>
<span class="s">"{}: {}"</span><span class="p">,</span>
<span class="n">info</span><span class="p">[</span><span class="s">"version"</span><span class="p">][</span><span class="s">"distribution"</span><span class="p">]</span><span class="nf">.as_str</span><span class="p">()</span><span class="nf">.unwrap</span><span class="p">(),</span>
<span class="n">info</span><span class="p">[</span><span class="s">"version"</span><span class="p">][</span><span class="s">"number"</span><span class="p">]</span><span class="nf">.as_str</span><span class="p">()</span><span class="nf">.unwrap</span><span class="p">()</span>
<span class="p">);</span></code></pre></figure>
<figure class="highlight"><pre><code class="language-rust" data-lang="rust"><span class="k">let</span> <span class="n">document</span><span class="p">:</span> <span class="n">JsonBody</span><span class="o"><</span><span class="mi">_</span><span class="o">></span> <span class="o">=</span> <span class="nd">json!</span><span class="p">({</span>
<span class="s">"title"</span><span class="p">:</span> <span class="s">"Moneyball"</span><span class="p">,</span>
<span class="s">"director"</span><span class="p">:</span> <span class="s">"Bennett Miller"</span><span class="p">,</span>
<span class="s">"year"</span><span class="p">:</span> <span class="s">"2011"</span>
<span class="p">})</span><span class="nf">.into</span><span class="p">();</span>
<span class="n">client</span><span class="nf">.send</span><span class="p">(</span>
<span class="nn">Method</span><span class="p">::</span><span class="n">Put</span><span class="p">,</span>
<span class="s">"movies/_doc/1"</span><span class="p">,</span>
<span class="nn">HeaderMap</span><span class="p">::</span><span class="nf">new</span><span class="p">(),</span>
<span class="nf">Some</span><span class="p">(</span><span class="o">&</span><span class="p">[(</span><span class="s">"refresh"</span><span class="p">,</span> <span class="s">"true"</span><span class="p">)]),</span>
<span class="nf">Some</span><span class="p">(</span><span class="n">document</span><span class="p">),</span>
<span class="nb">None</span><span class="p">,</span>
<span class="p">)</span><span class="k">.await</span><span class="o">?</span><span class="p">;</span></code></pre></figure>
<figure class="highlight"><pre><code class="language-rust" data-lang="rust"><span class="k">let</span> <span class="n">query</span><span class="p">:</span> <span class="n">JsonBody</span><span class="o"><</span><span class="mi">_</span><span class="o">></span> <span class="o">=</span> <span class="nd">json!</span><span class="p">({</span>
<span class="s">"size"</span><span class="p">:</span> <span class="mi">5</span><span class="p">,</span>
<span class="s">"query"</span><span class="p">:</span> <span class="p">{</span>
<span class="s">"multi_match"</span><span class="p">:</span> <span class="p">{</span>
<span class="s">"query"</span><span class="p">:</span> <span class="s">"miller"</span><span class="p">,</span>
<span class="s">"fields"</span><span class="p">:</span> <span class="p">[</span><span class="s">"title^2"</span><span class="p">,</span> <span class="s">"director"</span><span class="p">]</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="p">})</span><span class="nf">.into</span><span class="p">();</span>
<span class="k">let</span> <span class="n">search_response</span> <span class="o">=</span> <span class="n">client</span><span class="nf">.send</span><span class="p">(</span>
<span class="nn">Method</span><span class="p">::</span><span class="n">Post</span><span class="p">,</span>
<span class="o">&</span><span class="s">"/movies/_search"</span><span class="p">,</span>
<span class="nn">HeaderMap</span><span class="p">::</span><span class="nf">new</span><span class="p">(),</span>
<span class="nn">Option</span><span class="p">::</span><span class="o"><&</span><span class="p">()</span><span class="o">></span><span class="p">::</span><span class="nb">None</span><span class="p">,</span>
<span class="nf">Some</span><span class="p">(</span><span class="n">query</span><span class="p">),</span>
<span class="nb">None</span><span class="p">,</span>
<span class="p">)</span>
<span class="k">.await</span><span class="o">?</span><span class="p">;</span>
<span class="k">let</span> <span class="n">search_result</span> <span class="o">=</span> <span class="n">search_response</span><span class="py">.json</span><span class="p">::</span><span class="o"><</span><span class="n">Value</span><span class="o">></span><span class="p">()</span><span class="k">.await</span><span class="o">?</span><span class="p">;</span>
<span class="nd">println!</span><span class="p">(</span><span class="s">"Hits: {:#?}"</span><span class="p">,</span> <span class="n">search_result</span><span class="p">[</span><span class="s">"hits"</span><span class="p">][</span><span class="s">"hits"</span><span class="p">]</span><span class="nf">.as_array</span><span class="p">()</span><span class="nf">.unwrap</span><span class="p">());</span></code></pre></figure>
<figure class="highlight"><pre><code class="language-rust" data-lang="rust"><span class="n">client</span><span class="py">.send</span><span class="p">::</span><span class="o"><</span><span class="p">(),</span> <span class="p">()</span><span class="o">></span><span class="p">(</span>
<span class="nn">Method</span><span class="p">::</span><span class="n">Delete</span><span class="p">,</span>
<span class="s">"/movies"</span><span class="p">,</span>
<span class="nn">HeaderMap</span><span class="p">::</span><span class="nf">new</span><span class="p">(),</span>
<span class="nb">None</span><span class="p">,</span>
<span class="nb">None</span><span class="p">,</span>
<span class="nb">None</span><span class="p">,</span>
<span class="p">)</span>
<span class="k">.await</span><span class="o">?</span><span class="p">;</span></code></pre></figure>
<p>See the <a href="https://github.com/opensearch-project/opensearch-rs/blob/main/USER_GUIDE.md#make-raw-json-requests">updated user guide</a>, <a href="https://github.com/opensearch-project/opensearch-rs/blob/main/opensearch/examples/json.rs">a working demo</a> and a <a href="https://github.com/dblock/opensearch-rust-client-demo/compare/raw-json?expand=1">API vs. raw JSON diff</a> for more information.</p>
<h3 id="php">PHP</h3>
<h4 id="opensearch-php"><a href="https://github.com/opensearch-project/opensearch-php">opensearch-php</a></h4>
<p>Feature request, <a href="https://github.com/opensearch-project/opensearch-php/issues/166">opensearch-php#166</a>.</p>
<h3 id="go">Go</h3>
<h4 id="opensearch-go"><a href="https://github.com/opensearch-project/opensearch-go">opensearch-go</a></h4>
<p>Feature request, <a href="https://github.com/opensearch-project/opensearch-go/issues/395">opensearch-go#395</a>.</p>
<p><a href="https://code.dblock.org/2023/10/16/making-raw-json-rest-requests-to-opensearch.html">Making Raw JSON REST Requests to OpenSearch</a> was originally published by Daniel Doubrovkine at <a href="https://code.dblock.org">code.dblock.org | tech blog</a> on October 16, 2023.</p>https://code.dblock.org/2023/09/29/how-to-ingest-a-pdf-document-into-opensearch-with-ingest-attachment2023-09-29T00:00:00+00:002023-09-29T00:00:00+00:00Daniel Doubrovkinehttps://code.dblock.orgdblock@dblock.org<p>This is a neat feature available in OpenSearch via an optional <code class="language-plaintext highlighter-rouge">ingest-attachment</code> plugin. It’s installed on AWS domains by default.</p>
<p>Download OpenSearch, install the <code class="language-plaintext highlighter-rouge">ingest-attachment</code> plugin, and start it.</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash">wget https://artifacts.opensearch.org/releases/bundle/opensearch/2.10.0/opensearch-2.10.0-linux-x64.tar.gz
<span class="nb">tar </span>vfxz opensearch-2.10.0-linux-x64.tar.gz
<span class="nb">cd </span>opensearch-2.10.0/
./bin/opensearch-plugin <span class="nb">install </span>ingest-attachment
./opensearch-tar-install.sh</code></pre></figure>
<p>I’m using OpenSearch 2.10.</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash">curl <span class="nt">-u</span> admin:admin <span class="nt">-k</span> https://localhost:9200 | jq</code></pre></figure>
<figure class="highlight"><pre><code class="language-json" data-lang="json"><span class="p">{</span><span class="w">
</span><span class="nl">"name"</span><span class="p">:</span><span class="w"> </span><span class="s2">"ip-172-31-42-1"</span><span class="p">,</span><span class="w">
</span><span class="nl">"cluster_name"</span><span class="p">:</span><span class="w"> </span><span class="s2">"opensearch"</span><span class="p">,</span><span class="w">
</span><span class="nl">"cluster_uuid"</span><span class="p">:</span><span class="w"> </span><span class="s2">"gm4le40_R1eKzSDukDFWkA"</span><span class="p">,</span><span class="w">
</span><span class="nl">"version"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w">
</span><span class="nl">"distribution"</span><span class="p">:</span><span class="w"> </span><span class="s2">"opensearch"</span><span class="p">,</span><span class="w">
</span><span class="nl">"number"</span><span class="p">:</span><span class="w"> </span><span class="s2">"2.10.0"</span><span class="p">,</span><span class="w">
</span><span class="nl">"build_type"</span><span class="p">:</span><span class="w"> </span><span class="s2">"tar"</span><span class="p">,</span><span class="w">
</span><span class="nl">"build_hash"</span><span class="p">:</span><span class="w"> </span><span class="s2">"eee49cb340edc6c4d489bcd9324dda571fc8dc03"</span><span class="p">,</span><span class="w">
</span><span class="nl">"build_date"</span><span class="p">:</span><span class="w"> </span><span class="s2">"2023-09-20T23:54:29.889267151Z"</span><span class="p">,</span><span class="w">
</span><span class="nl">"build_snapshot"</span><span class="p">:</span><span class="w"> </span><span class="kc">false</span><span class="p">,</span><span class="w">
</span><span class="nl">"lucene_version"</span><span class="p">:</span><span class="w"> </span><span class="s2">"9.7.0"</span><span class="p">,</span><span class="w">
</span><span class="nl">"minimum_wire_compatibility_version"</span><span class="p">:</span><span class="w"> </span><span class="s2">"7.10.0"</span><span class="p">,</span><span class="w">
</span><span class="nl">"minimum_index_compatibility_version"</span><span class="p">:</span><span class="w"> </span><span class="s2">"7.0.0"</span><span class="w">
</span><span class="p">},</span><span class="w">
</span><span class="nl">"tagline"</span><span class="p">:</span><span class="w"> </span><span class="s2">"The OpenSearch Project: https://opensearch.org/"</span><span class="w">
</span><span class="p">}</span></code></pre></figure>
<p>Create an ingest pipeline.</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nv">$ </span>curl <span class="nt">-k</span> <span class="nt">-u</span> admin:admin <span class="nt">-X</span> PUT <span class="nt">-H</span> <span class="s2">"Content-type:application/json"</span> <span class="nt">--data</span> <span class="s1">'{"description":"Extract","processors":[{"attachment":{"field":"data","indexed_chars":-1}}]}'</span> https://localhost:9200/_ingest/pipeline/attachment | jq</code></pre></figure>
<figure class="highlight"><pre><code class="language-json" data-lang="json"><span class="p">{</span><span class="w">
</span><span class="nl">"acknowledged"</span><span class="p">:</span><span class="w"> </span><span class="kc">true</span><span class="w">
</span><span class="p">}</span></code></pre></figure>
<p>Download a dummy PDF.</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nv">$ </span>wget https://www.w3.org/WAI/ER/tests/xhtml/testfiles/resources/pdf/dummy.pdf</code></pre></figure>
<p>Ingest the PDF.</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nv">$ </span>curl <span class="nt">-k</span> <span class="nt">-u</span> admin:admin <span class="nt">-X</span> PUT <span class="nt">-H</span> <span class="s2">"Content-type:application/json"</span> <span class="nt">--data</span> <span class="s1">'{"filename":"dummy.pdf","title":"Dummy PDF","data":"'</span><span class="s2">"</span><span class="si">$(</span><span class="nb">base64</span> <span class="nt">-w</span> 0 dummy.pdf<span class="si">)</span><span class="s2">"</span><span class="s1">'"}'</span> https://localhost:9200/my_index/_doc/1?pipeline<span class="o">=</span>attachment | jq</code></pre></figure>
<figure class="highlight"><pre><code class="language-json" data-lang="json"><span class="p">{</span><span class="w">
</span><span class="nl">"_index"</span><span class="p">:</span><span class="w"> </span><span class="s2">"my_index"</span><span class="p">,</span><span class="w">
</span><span class="nl">"_id"</span><span class="p">:</span><span class="w"> </span><span class="s2">"1"</span><span class="p">,</span><span class="w">
</span><span class="nl">"_version"</span><span class="p">:</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span><span class="w">
</span><span class="nl">"result"</span><span class="p">:</span><span class="w"> </span><span class="s2">"created"</span><span class="p">,</span><span class="w">
</span><span class="nl">"_shards"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w">
</span><span class="nl">"total"</span><span class="p">:</span><span class="w"> </span><span class="mi">2</span><span class="p">,</span><span class="w">
</span><span class="nl">"successful"</span><span class="p">:</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span><span class="w">
</span><span class="nl">"failed"</span><span class="p">:</span><span class="w"> </span><span class="mi">0</span><span class="w">
</span><span class="p">},</span><span class="w">
</span><span class="nl">"_seq_no"</span><span class="p">:</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w">
</span><span class="nl">"_primary_term"</span><span class="p">:</span><span class="w"> </span><span class="mi">1</span><span class="w">
</span><span class="p">}</span></code></pre></figure>
<p>Search.</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nv">$ </span>curl <span class="nt">-k</span> <span class="nt">-u</span> admin:admin <span class="nt">-X</span> POST <span class="nt">-H</span> <span class="s2">"Content-type:application/json"</span> <span class="nt">--data</span> <span class="s1">'{"query":{"match":{"attachment.content":{"query":"dummy"}}}}'</span> https://localhost:9200/my_index/_search | jq</code></pre></figure>
<figure class="highlight"><pre><code class="language-json" data-lang="json"><span class="p">{</span><span class="w">
</span><span class="nl">"took"</span><span class="p">:</span><span class="w"> </span><span class="mi">2</span><span class="p">,</span><span class="w">
</span><span class="nl">"timed_out"</span><span class="p">:</span><span class="w"> </span><span class="kc">false</span><span class="p">,</span><span class="w">
</span><span class="nl">"_shards"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w">
</span><span class="nl">"total"</span><span class="p">:</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span><span class="w">
</span><span class="nl">"successful"</span><span class="p">:</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span><span class="w">
</span><span class="nl">"skipped"</span><span class="p">:</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w">
</span><span class="nl">"failed"</span><span class="p">:</span><span class="w"> </span><span class="mi">0</span><span class="w">
</span><span class="p">},</span><span class="w">
</span><span class="nl">"hits"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w">
</span><span class="nl">"total"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w">
</span><span class="nl">"value"</span><span class="p">:</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span><span class="w">
</span><span class="nl">"relation"</span><span class="p">:</span><span class="w"> </span><span class="s2">"eq"</span><span class="w">
</span><span class="p">},</span><span class="w">
</span><span class="nl">"max_score"</span><span class="p">:</span><span class="w"> </span><span class="mf">0.39556286</span><span class="p">,</span><span class="w">
</span><span class="nl">"hits"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="w">
</span><span class="p">{</span><span class="w">
</span><span class="nl">"_index"</span><span class="p">:</span><span class="w"> </span><span class="s2">"my_index"</span><span class="p">,</span><span class="w">
</span><span class="nl">"_id"</span><span class="p">:</span><span class="w"> </span><span class="s2">"1"</span><span class="p">,</span><span class="w">
</span><span class="nl">"_score"</span><span class="p">:</span><span class="w"> </span><span class="mf">0.39556286</span><span class="p">,</span><span class="w">
</span><span class="nl">"_source"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w">
</span><span class="nl">"filename"</span><span class="p">:</span><span class="w"> </span><span class="s2">"dummy.pdf"</span><span class="p">,</span><span class="w">
</span><span class="nl">"data"</span><span class="p">:</span><span class="w"> </span><span class="s2">"..."</span><span class="p">,</span><span class="w">
</span><span class="nl">"attachment"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w">
</span><span class="nl">"date"</span><span class="p">:</span><span class="w"> </span><span class="s2">"2007-02-23T15:56:37Z"</span><span class="p">,</span><span class="w">
</span><span class="nl">"content_type"</span><span class="p">:</span><span class="w"> </span><span class="s2">"application/pdf"</span><span class="p">,</span><span class="w">
</span><span class="nl">"author"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Evangelos Vlachogiannis"</span><span class="p">,</span><span class="w">
</span><span class="nl">"language"</span><span class="p">:</span><span class="w"> </span><span class="s2">"mt"</span><span class="p">,</span><span class="w">
</span><span class="nl">"content"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Dummy PDF file</span><span class="se">\n\n\n\t</span><span class="s2">Dummy PDF file"</span><span class="p">,</span><span class="w">
</span><span class="nl">"content_length"</span><span class="p">:</span><span class="w"> </span><span class="mi">35</span><span class="w">
</span><span class="p">},</span><span class="w">
</span><span class="nl">"title"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Dummy PDF"</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="p">]</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="p">}</span></code></pre></figure>
<p><a href="https://code.dblock.org/2023/09/29/how-to-ingest-a-pdf-document-into-opensearch-with-ingest-attachment.html">How to ingest a PDF document into OpenSearch with the ingest-attachment plugin</a> was originally published by Daniel Doubrovkine at <a href="https://code.dblock.org">code.dblock.org | tech blog</a> on September 29, 2023.</p>https://code.dblock.org/2023/09/29/writing-opensearch-plugins-and-extensions2023-09-29T00:00:00+00:002023-09-29T00:00:00+00:00Daniel Doubrovkinehttps://code.dblock.orgdblock@dblock.org<p>Check out <a href="https://www.youtube.com/watch?v=TZy7ViZbbHc">this talk recorded at OpenSearchCon 2023</a>, or continue reading.</p>
<p><iframe width="560" height="315" src="https://www.youtube.com/embed/TZy7ViZbbHc?si=3pS7bkbK0hpR_V86" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen=""></iframe></p>
<p>Most custom functionality in <a href="https://opensearch.org">OpenSearch</a> is implemented with <a href="https://opensearch.org/docs/latest/install-and-configure/plugins/">plugins</a>. That is, in theory. In practice, much of core functionality is also implemented in plugins. For example, security or k-NN search are both plugins, even though one would reasonably expect a security framework to be part of the core engine (with multiple implementations in plugins), or for k-NN search to be living right next to full text search. Furthermore, some plugins, such repository-s3 that reads and writes snapshots from/to Amazon S3, live in core, whereas one would expect optional functionality to be … optional. The location for plugins is more a consequence of business and organizational decisions than technical. Software architecture really tends to line up to our business structures!</p>
<p>The default distribution of OpenSearch 2.10 ships with <a href="https://github.com/opensearch-project/opensearch-plugins/blob/main/plugins/.meta">20 plugins</a>, all enabled by default, erasing much of the difference between what’s <em>core</em> vs. what’s <em>a plugin</em>. A vast majority of users install and run the whole thing.</p>
<p>Plugins suffer from 3 major limitations: rigid version compatibility, lack of isolation, and transitive dependency hell. These problems are described in great detail <a href="https://opensearch.org/blog/introducing-extensions-for-opensearch/">in this blog post</a>, but before we go there, let’s follow <a href="https://logz.io/blog/opensearch-plugins/">another blog post</a> and write a plugin that implements a RESTful API. The complete source code for the plugin is <a href="https://github.com/dblock/opensearch-hello-plugin-java">here</a>.</p>
<p>A plugin inherits from <code class="language-plaintext highlighter-rouge">Plugin</code> and our plugin implements <code class="language-plaintext highlighter-rouge">ActionPlugin</code> (a plugin that exposes actions via REST). Our REST handler responds to <code class="language-plaintext highlighter-rouge">GET</code> requests.</p>
<figure class="highlight"><pre><code class="language-java" data-lang="java"><span class="kd">public</span> <span class="kd">class</span> <span class="nc">HelloPlugin</span> <span class="kd">extends</span> <span class="nc">Plugin</span> <span class="kd">implements</span> <span class="nc">ActionPlugin</span> <span class="o">{</span>
<span class="nd">@Override</span>
<span class="kd">public</span> <span class="nc">List</span> <span class="nf">getRestHandlers</span><span class="o">(</span><span class="kd">final</span> <span class="nc">Settings</span> <span class="n">settings</span><span class="o">,</span>
<span class="kd">final</span> <span class="nc">RestController</span> <span class="n">restController</span><span class="o">,</span>
<span class="kd">final</span> <span class="nc">ClusterSettings</span> <span class="n">clusterSettings</span><span class="o">,</span>
<span class="kd">final</span> <span class="nc">IndexScopedSettings</span> <span class="n">indexScopedSettings</span><span class="o">,</span>
<span class="kd">final</span> <span class="nc">SettingsFilter</span> <span class="n">settingsFilter</span><span class="o">,</span>
<span class="kd">final</span> <span class="nc">IndexNameExpressionResolver</span> <span class="n">indexNameExpressionResolver</span><span class="o">,</span>
<span class="kd">final</span> <span class="nc">Supplier</span> <span class="n">nodesInCluster</span><span class="o">)</span> <span class="o">{</span>
<span class="k">return</span> <span class="nf">singletonList</span><span class="o">(</span><span class="k">new</span> <span class="nc">RestHelloAction</span><span class="o">());</span>
<span class="o">}</span>
<span class="o">}</span></code></pre></figure>
<figure class="highlight"><pre><code class="language-java" data-lang="java"><span class="kd">public</span> <span class="kd">class</span> <span class="nc">RestHelloAction</span> <span class="kd">extends</span> <span class="nc">BaseRestHandler</span> <span class="o">{</span>
<span class="nd">@Override</span>
<span class="kd">public</span> <span class="nc">List</span> <span class="nf">routes</span><span class="o">()</span> <span class="o">{</span>
<span class="k">return</span> <span class="nf">unmodifiableList</span><span class="o">(</span><span class="n">asList</span><span class="o">(</span>
<span class="k">new</span> <span class="nf">Route</span><span class="o">(</span><span class="no">GET</span><span class="o">,</span> <span class="s">"/_plugins/hello-world-java"</span><span class="o">)</span>
<span class="o">));</span>
<span class="o">}</span>
<span class="nd">@Override</span>
<span class="kd">protected</span> <span class="nc">RestChannelConsumer</span> <span class="nf">prepareRequest</span><span class="o">(</span>
<span class="nc">RestRequest</span> <span class="n">request</span><span class="o">,</span>
<span class="nc">NodeClient</span> <span class="n">client</span><span class="o">)</span> <span class="kd">throws</span> <span class="nc">IOException</span> <span class="o">{</span>
<span class="k">return</span> <span class="n">channel</span> <span class="o">-></span> <span class="o">{</span>
<span class="n">channel</span><span class="o">.</span><span class="na">sendResponse</span><span class="o">(</span><span class="k">new</span> <span class="nc">BytesRestResponse</span><span class="o">(</span>
<span class="nc">RestStatus</span><span class="o">.</span><span class="na">OK</span><span class="o">,</span>
<span class="s">"Hello from Java! 👋\n"</span>
<span class="o">)</span>
<span class="o">);</span>
<span class="o">}</span>
<span class="o">}</span></code></pre></figure>
<p>Let’s install the plugin, start OpenSearch, and make an HTTP request to the newly added endpoint on the OpenSearch node. The request will be forwarded to the plugin and the REST handler will handle it.</p>
<p><img src="https://code.dblock.org/images/posts/2023/2023-09-29-writing-opensearch-plugins-and-extensions/plugin.gif" alt="" class="black" /></p>
<p>How easy was it to write a plugin? Very easy! But it’s much harder to write a <em>production</em> plugin on top of a 1.4MM LOC OpenSearch core. You will need to master dependency injection, understand OpenSearch runtime thread pools, and the (optional) security framework. Finally, I promise that you will have a <em>very</em> hard time playing nice with other plugins that share the same Java heap, and execute in the same Java Virtual Machine, deployed on every node in a large scale cluster that is actively indexing petabytes of data, or serving thousands of searches per second.</p>
<p>What can we do to help it?</p>
<p>In OpenSearch 2.9 we have introduced a new concept called <em>extensions</em> and shipped an experimental <a href="https://github.com/opensearch-project/opensearch-sdk-java">OpenSearch Java SDK</a>. Extensions are full processes, run on a separate JVM and can execute on a separate host.</p>
<p>The code for an extension with its REST handler is almost identical to the one for a plugin. This was done on purpose to help migrations. The complete source code for this extension is <a href="https://github.com/opensearch-project/opensearch-sdk-java/tree/main/src/main/java/org/opensearch/sdk/sample/helloworld">here</a>.</p>
<figure class="highlight"><pre><code class="language-java" data-lang="java"><span class="kd">public</span> <span class="kd">class</span> <span class="nc">HelloWorldExtension</span> <span class="kd">extends</span> <span class="nc">BaseExtension</span> <span class="kd">implements</span> <span class="nc">ActionExtension</span> <span class="o">{</span>
<span class="nd">@Override</span>
<span class="kd">public</span> <span class="nc">List</span><span class="o"><</span><span class="nc">ExtensionRestHandler</span><span class="o">></span> <span class="nf">getExtensionRestHandlers</span><span class="o">()</span> <span class="o">{</span>
<span class="k">return</span> <span class="nc">List</span><span class="o">.</span><span class="na">of</span><span class="o">(</span><span class="k">new</span> <span class="nc">RestHelloAction</span><span class="o">());</span>
<span class="o">);</span>
<span class="o">}</span></code></pre></figure>
<figure class="highlight"><pre><code class="language-java" data-lang="java"><span class="kd">public</span> <span class="kd">class</span> <span class="nc">RestHelloAction</span> <span class="kd">extends</span> <span class="nc">BaseExtensionRestHandler</span> <span class="o">{</span>
<span class="nd">@Override</span>
<span class="kd">public</span> <span class="nc">List</span><span class="o"><</span><span class="nc">NamedRoute</span><span class="o">></span> <span class="nf">routes</span><span class="o">()</span> <span class="o">{</span>
<span class="k">return</span> <span class="nc">List</span><span class="o">.</span><span class="na">of</span><span class="o">(</span>
<span class="k">new</span> <span class="nc">NamedRoute</span><span class="o">.</span><span class="na">Builder</span><span class="o">().</span><span class="na">method</span><span class="o">(</span><span class="no">GET</span><span class="o">).</span><span class="na">path</span><span class="o">(</span><span class="s">"/hello"</span><span class="o">)</span>
<span class="o">.</span><span class="na">handler</span><span class="o">(</span><span class="n">handleGetRequest</span><span class="o">)</span>
<span class="o">.</span><span class="na">build</span><span class="o">();</span>
<span class="o">)</span>
<span class="o">}</span>
<span class="kd">private</span> <span class="nc">Function</span><span class="o"><</span><span class="nc">RestRequest</span><span class="o">,</span> <span class="nc">ExtensionRestResponse</span><span class="o">></span> <span class="n">handleGetRequest</span> <span class="o">=</span>
<span class="o">(</span><span class="n">request</span><span class="o">)</span> <span class="o">-></span> <span class="o">{</span>
<span class="k">return</span> <span class="k">new</span> <span class="nf">ExtensionRestResponse</span><span class="o">(</span>
<span class="n">request</span><span class="o">,</span> <span class="no">OK</span><span class="o">,</span> <span class="s">"Hello from Java! 👋\n"</span>
<span class="o">);</span>
<span class="o">}</span>
<span class="o">}</span></code></pre></figure>
<p>Let’s enable the experimental extensions feature in OpenSearch, install and run this extension.</p>
<p><img src="https://code.dblock.org/images/posts/2023/2023-09-29-writing-opensearch-plugins-and-extensions/java-extension.gif" alt="" class="black" /></p>
<p>Extensions overcome the major limitations of plugins by being semver compatible (you can run an extension on many versions of OpenSearch without rebuilding it), do not require restarting a cluster to be installed, and are isolated at runtime. Because you can run an extension remotely, you can also right-size the extension node (no need to add memory to every node in the cluster because one plugin occasionally needs it). In <a href="https://opensearch.org/blog/introducing-extensions-for-opensearch/">the introductory blog post</a> we cut the cost of a 36-node cluster that performed high cardinality anomaly detection by a third using extensions.</p>
<p>Other than reducing costs, what else can we use this technology for?</p>
<p>Python is the language of machine learning. Unlike a plugin, we can also write an extension in Python. The complete source code for the sample below is <a href="https://github.com/opensearch-project/opensearch-sdk-py/tree/main/samples/hello">here</a> and it looks very similar to the Java one.</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="k">class</span> <span class="nc">HelloExtension</span><span class="p">(</span><span class="n">Extension</span><span class="p">,</span> <span class="n">ActionExtension</span><span class="p">):</span>
<span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="n">Extension</span><span class="p">.</span><span class="n">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="s">"hello-world"</span><span class="p">)</span>
<span class="n">ActionExtension</span><span class="p">.</span><span class="n">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">)</span>
<span class="o">@</span><span class="nb">property</span>
<span class="k">def</span> <span class="nf">rest_handlers</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="p">[</span><span class="n">HelloRestHandler</span><span class="p">()]</span></code></pre></figure>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="k">class</span> <span class="nc">HelloRestHandler</span><span class="p">(</span><span class="n">ExtensionRestHandler</span><span class="p">):</span>
<span class="k">def</span> <span class="nf">handle_request</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">rest_request</span><span class="p">):</span>
<span class="k">return</span> <span class="n">ExtensionRestResponse</span><span class="p">(</span>
<span class="n">RestStatus</span><span class="p">.</span><span class="n">OK</span><span class="p">,</span>
<span class="nb">bytes</span><span class="p">(</span><span class="s">"Hello from Python! 👋</span><span class="se">\n</span><span class="s">"</span><span class="p">),</span>
<span class="n">ExtensionRestResponse</span><span class="p">.</span><span class="n">TEXT_CONTENT_TYPE</span>
<span class="p">)</span>
<span class="o">@</span><span class="nb">property</span>
<span class="k">def</span> <span class="nf">routes</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="p">[</span>
<span class="n">NamedRoute</span><span class="p">(</span><span class="n">method</span><span class="o">=</span><span class="n">RestMethod</span><span class="p">.</span><span class="n">GET</span><span class="p">,</span> <span class="n">path</span><span class="o">=</span><span class="s">"/hello"</span><span class="p">)</span>
<span class="p">]</span></code></pre></figure>
<p>Let’s enable the experimental extensions feature in OpenSearch, install and run this extension.</p>
<p><img src="https://code.dblock.org/images/posts/2023/2023-09-29-writing-opensearch-plugins-and-extensions/python-extension.gif" alt="" class="black" /></p>
<p>How is this even possible? What is the bridge between OpenSearch, implemented in Java, and an extension written in Python?</p>
<p>With most heavy lifting done by <a href="https://twitter.com/dbwiddis">Dan Widdis</a> of <a href="https://github.com/oshi/oshi">OSHI</a> fame, we reverse-engineered, then <a href="https://github.com/opensearch-project/opensearch-sdk-py/tree/main/src/opensearch_sdk_py/transport">implemented</a> the Elasticsearch/OpenSearch transport protocol in Python, then took the extensions support for the ride. The latter was very easy because extension messages are all implemented using protobuf and you can just compile those to Python with existing tools.</p>
<p>In theory, to quote an Engineer from <a href="https://aryn.ai/">Aryn</a>, this <em>opens up the entire Python model zoo-space to OpenSearch</em>, including TensorFlow or Pytorch. But it also shows how one could implement an entire OpenSearch node in another language that doesn’t suffer from, for example, GC pauses.</p>
<p>I hope that someone reading this blog post will build a useful extension in Python for OpenSearch. Can <em>you</em> make it happen?</p>
<p><a href="https://code.dblock.org/2023/09/29/writing-opensearch-plugins-and-extensions.html">Writing OpenSearch Plugins and Extensions (in Python)</a> was originally published by Daniel Doubrovkine at <a href="https://code.dblock.org">code.dblock.org | tech blog</a> on September 29, 2023.</p>https://code.dblock.org/2023/08/08/changing-the-default-admin-password-in-opensearch2023-08-08T00:00:00+00:002023-08-08T00:00:00+00:00Daniel Doubrovkinehttps://code.dblock.orgdblock@dblock.org<p>OpenSearch ships with a <a href="https://opensearch.org/docs/latest/">pretty comprehensive doc</a> on getting started, along with a comprehensive reference to its vast <a href="https://opensearch.org/docs/latest/security/configuration/index/">security configuration</a>. This can be a bit overwhelming. Here’s how one can change the default “admin” password. In my case I’ll do it inside my demo docker instance, but you can skip the Docker parts if you’re just downloading and installing OpenSearch directly.</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash">docker pull opensearchproject/opensearch:latest
docker run <span class="nt">-d</span> <span class="nt">-p</span> 9200:9200 <span class="nt">-p</span> 9600:9600 <span class="nt">-e</span> <span class="s2">"discovery.type=single-node"</span> opensearchproject/opensearch:latest</code></pre></figure>
<p>Ensure that the default username and password works.</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nv">$ </span>curl <span class="nt">--insecure</span> <span class="nt">-u</span> admin:invalid https://localhost:9200
Unauthorized
curl <span class="nt">--insecure</span> <span class="nt">-u</span> admin:admin https://localhost:9200
<span class="o">{</span>
<span class="s2">"name"</span> : <span class="s2">"b09419b98216"</span>,
<span class="s2">"cluster_name"</span> : <span class="s2">"docker-cluster"</span>,
<span class="s2">"cluster_uuid"</span> : <span class="s2">"SYUzvRvqT06ld8IdvE5okQ"</span>,
<span class="s2">"version"</span> : <span class="o">{</span>
<span class="s2">"distribution"</span> : <span class="s2">"opensearch"</span>,
<span class="s2">"number"</span> : <span class="s2">"2.9.0"</span>,
<span class="s2">"build_type"</span> : <span class="s2">"tar"</span>,
<span class="s2">"build_hash"</span> : <span class="s2">"1164221ee2b8ba3560f0ff492309867beea28433"</span>,
<span class="s2">"build_date"</span> : <span class="s2">"2023-07-18T21:22:48.164885046Z"</span>,
<span class="s2">"build_snapshot"</span> : <span class="nb">false</span>,
<span class="s2">"lucene_version"</span> : <span class="s2">"9.7.0"</span>,
<span class="s2">"minimum_wire_compatibility_version"</span> : <span class="s2">"7.10.0"</span>,
<span class="s2">"minimum_index_compatibility_version"</span> : <span class="s2">"7.0.0"</span>
<span class="o">}</span>,
<span class="s2">"tagline"</span> : <span class="s2">"The OpenSearch Project: https://opensearch.org/"</span>
<span class="o">}</span></code></pre></figure>
<h3 id="the-easy-way">The Easy Way</h3>
<p>Users can change passwords using the <a href="https://opensearch.org/docs/latest/security/access-control/api/">security plugin REST API</a>. We can examine the <code class="language-plaintext highlighter-rouge">admin</code> user.</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash">curl <span class="nt">--insecure</span> <span class="nt">-u</span> admin:password <span class="nt">-X</span> GET <span class="s2">"https://localhost:9200/_plugins/_security/api/account"</span>
<span class="o">{</span>
<span class="s2">"user_name"</span>: <span class="s2">"admin"</span>,
<span class="s2">"is_reserved"</span>: <span class="nb">true</span>,
<span class="s2">"is_hidden"</span>: <span class="nb">false</span>,
<span class="s2">"is_internal_user"</span>: <span class="nb">true</span>,
<span class="s2">"user_requested_tenant"</span>: null,
<span class="s2">"backend_roles"</span>: <span class="o">[</span>
<span class="s2">"admin"</span>
<span class="o">]</span>,
<span class="s2">"custom_attribute_names"</span>: <span class="o">[]</span>,
<span class="s2">"tenants"</span>: <span class="o">{</span>
<span class="s2">"global_tenant"</span>: <span class="nb">true</span>,
<span class="s2">"admin_tenant"</span>: <span class="nb">true</span>,
<span class="s2">"admin"</span>: <span class="nb">true</span>
<span class="o">}</span>,
<span class="s2">"roles"</span>: <span class="o">[</span>
<span class="s2">"own_index"</span>,
<span class="s2">"all_access"</span>
<span class="o">]</span>
<span class="o">}</span></code></pre></figure>
<p>However, updating the admin password doesn’t work because the default security policy locks it down.</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash">curl <span class="nt">--insecure</span> <span class="nt">-u</span> admin:password <span class="nt">-XPUT</span> <span class="s2">"https://localhost:9200/_plugins/_security/api/account"</span> <span class="nt">-H</span> <span class="s1">'Content-Type: application/json'</span> <span class="nt">-d</span><span class="s1">'
{
"current_password": "password",
"password": "6P2fTnMRTnDRiEEm"
}'</span>
<span class="o">{</span><span class="s2">"status"</span>:<span class="s2">"FORBIDDEN"</span>,<span class="s2">"message"</span>:<span class="s2">"Resource 'admin' is read-only."</span><span class="o">}</span></code></pre></figure>
<p>I found <a href="https://github.com/opensearch-project/security/issues/1576">security#1576</a> that aims to fix this, but in the meantime, we’ll have to do it the hard way.</p>
<h3 id="the-hard-way">The Hard Way</h3>
<p>The source for the docker-compose file used for the distribution is <a href="https://github.com/opensearch-project/opensearch-build/blob/main/docker/release/dockercomposefiles/docker-compose-2.x.yml">here</a>. The first time this is run it <a href="https://github.com/opensearch-project/opensearch-build/blob/main/docker/release/config/opensearch/opensearch-docker-entrypoint.sh#L38">executes</a> the <a href="https://github.com/opensearch-project/security/blob/main/tools/install_demo_configuration.sh">install_demo_configuration.sh</a> script from the security plugin, which itself runs <a href="https://github.com/opensearch-project/security/blob/main/tools/securityadmin.sh">securityadmin_demo.sh</a>, which itself runs <a href="https://github.com/opensearch-project/security/blob/main/src/main/java/org/opensearch/security/tools/SecurityAdmin.java"><code class="language-plaintext highlighter-rouge">org.opensearch.security.tools.SecurityAdmin</code></a> that’s written in Java. This installs a default security configuration. Let’s see what it looks like.</p>
<p>Find the docker container ID. In my case it’s <code class="language-plaintext highlighter-rouge">b09419b98216</code>.</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nv">$ </span>docker ps
CONTAINER ID IMAGE
b09419b98216 opensearchproject/opensearch:latest ...</code></pre></figure>
<p>Run a shell in the instance.</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nv">$ </span>docker <span class="nb">exec</span> <span class="nt">-it</span> b09419b98216 sh
sh-4.2<span class="err">$</span></code></pre></figure>
<p>Run the security plugin configuration tool to output the current configuration.</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nv">$ </span><span class="nb">mkdir </span>current-config
<span class="nv">$ </span>/usr/share/opensearch/plugins/opensearch-security/tools/securityadmin.sh <span class="se">\</span>
<span class="nt">-icl</span> <span class="se">\</span>
<span class="nt">-cacert</span> /usr/share/opensearch/config/root-ca.pem <span class="se">\</span>
<span class="nt">-cert</span> /usr/share/opensearch/config/kirk.pem <span class="se">\</span>
<span class="nt">-key</span> /usr/share/opensearch/config/kirk-key.pem <span class="se">\</span>
<span class="nt">-r</span> <span class="se">\</span>
<span class="nt">-cd</span> current-config</code></pre></figure>
<p>Examine the <code class="language-plaintext highlighter-rouge">internal_users.yml</code> file that was written to <code class="language-plaintext highlighter-rouge">current-config</code> with <code class="language-plaintext highlighter-rouge">cat current-config/internal_users_*.yml</code> (mine was called <code class="language-plaintext highlighter-rouge">internal_users_2023-Aug-08_15-52-25.yml</code>). The interesting part is the admin user.</p>
<figure class="highlight"><pre><code class="language-yaml" data-lang="yaml"><span class="na">admin</span><span class="pi">:</span>
<span class="na">hash</span><span class="pi">:</span> <span class="s2">"</span><span class="s">$2a$12$VcCDgh2NDk07JGN0rjGbM.Ad41qVR/YFJcgHp0UGns5JDymv..TOG"</span>
<span class="na">reserved</span><span class="pi">:</span> <span class="no">true</span>
<span class="na">backend_roles</span><span class="pi">:</span>
<span class="pi">-</span> <span class="s2">"</span><span class="s">admin"</span>
<span class="na">description</span><span class="pi">:</span> <span class="s2">"</span><span class="s">Demo</span><span class="nv"> </span><span class="s">admin</span><span class="nv"> </span><span class="s">user"</span></code></pre></figure>
<p>Let’s generate a new password hash for our new password, <code class="language-plaintext highlighter-rouge">password</code>.</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash">sh-4.2<span class="nv">$ </span>./plugins/opensearch-security/tools/hash.sh
<span class="k">**************************************************************************</span>
<span class="k">**</span> This tool will be deprecated <span class="k">in </span>the next major release of OpenSearch <span class="k">**</span>
<span class="k">**</span> https://github.com/opensearch-project/security/issues/1755 <span class="k">**</span>
<span class="k">**************************************************************************</span>
<span class="o">[</span>Password:] password
<span class="nv">$2y$12$jeBybG79iCu0y</span>.A1NMqdI.8gA/d0Mrg6VRI3BrGD4VvTfeA1Z4tXu</code></pre></figure>
<p>Edit the <code class="language-plaintext highlighter-rouge">current-config/internal_users_*.yml</code> file, and replace the password hash with the one above.</p>
<figure class="highlight"><pre><code class="language-yaml" data-lang="yaml"><span class="na">admin</span><span class="pi">:</span>
<span class="na">hash</span><span class="pi">:</span> <span class="s2">"</span><span class="s">$2y$12$jeBybG79iCu0y.A1NMqdI.8gA/d0Mrg6VRI3BrGD4VvTfeA1Z4tXu"</span>
<span class="na">reserved</span><span class="pi">:</span> <span class="no">true</span>
<span class="na">backend_roles</span><span class="pi">:</span>
<span class="pi">-</span> <span class="s2">"</span><span class="s">admin"</span>
<span class="na">description</span><span class="pi">:</span> <span class="s2">"</span><span class="s">Demo</span><span class="nv"> </span><span class="s">admin</span><span class="nv"> </span><span class="s">user"</span></code></pre></figure>
<p>Upload the configuration.</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nv">$ </span>/usr/share/opensearch/plugins/opensearch-security/tools/securityadmin.sh <span class="se">\</span>
<span class="nt">-icl</span> <span class="se">\</span>
<span class="nt">-t</span> internalusers <span class="se">\</span>
<span class="nt">-f</span> current-config/internal_users_[your file name here].yml <span class="se">\</span>
<span class="nt">-cacert</span> /usr/share/opensearch/config/root-ca.pem <span class="se">\</span>
<span class="nt">-cert</span> /usr/share/opensearch/config/kirk.pem <span class="se">\</span>
<span class="nt">-key</span> /usr/share/opensearch/config/kirk-key.pem
Security Admin v7
Will connect to localhost:9200 ... <span class="k">done
</span>Connected as <span class="s2">"CN=kirk,OU=client,O=client,L=test,C=de"</span>
OpenSearch Version: 2.9.0
Contacting opensearch cluster <span class="s1">'opensearch'</span> and <span class="nb">wait </span><span class="k">for </span>YELLOW clusterstate ...
Clustername: docker-cluster
Clusterstate: YELLOW
Number of nodes: 1
Number of data nodes: 1
.opendistro_security index already exists, so we <span class="k">do </span>not need to create one.
Populate config from /usr/share/opensearch
Force <span class="nb">type</span>: internalusers
Will update <span class="s1">'/internalusers'</span> with current-config/internal_users_....yml
SUCC: Configuration <span class="k">for</span> <span class="s1">'internalusers'</span> created or updated</code></pre></figure>
<p>Test the new password.</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nv">$ </span>curl <span class="nt">--insecure</span> <span class="nt">-u</span> admin:admin https://localhost:9200
Unauthorized</code></pre></figure>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nv">$ </span>curl <span class="nt">--insecure</span> <span class="nt">-u</span> admin:password https://localhost:9200
<span class="o">{</span>
<span class="s2">"name"</span> : <span class="s2">"b09419b98216"</span>,
<span class="s2">"cluster_name"</span> : <span class="s2">"docker-cluster"</span>,
<span class="s2">"cluster_uuid"</span> : <span class="s2">"SYUzvRvqT06ld8IdvE5okQ"</span>,
<span class="s2">"version"</span> : <span class="o">{</span>
<span class="s2">"distribution"</span> : <span class="s2">"opensearch"</span>,
<span class="s2">"number"</span> : <span class="s2">"2.9.0"</span>,
<span class="s2">"build_type"</span> : <span class="s2">"tar"</span>,
<span class="s2">"build_hash"</span> : <span class="s2">"1164221ee2b8ba3560f0ff492309867beea28433"</span>,
<span class="s2">"build_date"</span> : <span class="s2">"2023-07-18T21:22:48.164885046Z"</span>,
<span class="s2">"build_snapshot"</span> : <span class="nb">false</span>,
<span class="s2">"lucene_version"</span> : <span class="s2">"9.7.0"</span>,
<span class="s2">"minimum_wire_compatibility_version"</span> : <span class="s2">"7.10.0"</span>,
<span class="s2">"minimum_index_compatibility_version"</span> : <span class="s2">"7.0.0"</span>
<span class="o">}</span>,
<span class="s2">"tagline"</span> : <span class="s2">"The OpenSearch Project: https://opensearch.org/"</span>
<span class="o">}</span></code></pre></figure>
<p>Note that restarting the Docker container will override your changes with a clean image.</p>
<p><a href="https://code.dblock.org/2023/08/08/changing-the-default-admin-password-in-opensearch.html">Changing the default admin password in OpenSearch</a> was originally published by Daniel Doubrovkine at <a href="https://code.dblock.org">code.dblock.org | tech blog</a> on August 08, 2023.</p>https://code.dblock.org/2023/06/16/getting-started-with-vector-dbs-in-python2023-06-16T00:00:00+00:002023-06-16T00:00:00+00:00Daniel Doubrovkinehttps://code.dblock.orgdblock@dblock.org<p>Vector databases are all the rage today.</p>
<p>I’ve built a few iterations of vector search, beginning in 2011 at Artsy, powered by the <a href="https://en.wikipedia.org/wiki/The_Art_Genome_Project">Art Genome Project</a>. Compared to LLM use-cases today, Artsy is a small, 1200-dimensional sparse vector and semantic search engine. The first attempt at vector search resulted in a brute-force exact k-nearest-neighbor search with data stored in MongoDB, written in Ruby. The second attempt was an approximate nearest-neighbor implementation using <a href="https://en.wikipedia.org/wiki/Locality-sensitive_hashing">LSH</a>, and finally <a href="https://www.cs.princeton.edu/cass/papers/www11.pdf">NN-Descent</a>. Around 2017 we migrated to Elasticsearch, and I am speculating the team has moved to OpenSearch by now because it’s open-source.</p>
<p>Things have evolved rapidly with generative AI, so let’s try to index and search some vectors in 2023 in Python, using the simplest of the libraries, usually pure HTTP when available. You can draw your own conclusions of which engines are better and/or easier to use. Working code for this blog post is <a href="https://github.com/dblock/vectordb-hello-world">here</a>.</p>
<p>In alphabetical order.</p>
<ul>
<li><a href="#chroma">Chroma</a></li>
<li><a href="#clickhouse">ClickHouse</a></li>
<li><a href="#myscale">MyScale</a></li>
<li><a href="#opensearch">OpenSearch</a></li>
<li><a href="#pgvector">pgVector</a></li>
<li><a href="#pinecone">Pinecone</a></li>
<li><a href="#qdrant">Qdrant</a></li>
<li><a href="#redis">Redis</a></li>
<li><a href="#vespa">Vespa</a></li>
<li><a href="#weaviate">Weaviate</a></li>
<li><a href="#others">Others</a></li>
</ul>
<h3 id="chroma">Chroma</h3>
<p><a href="https://www.trychroma.com/">Chroma</a> is an AI-native open-source embedding database. You can clone Chroma from GitHub and run it locally.</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash">git clone https://github.com/chroma-core/chroma.git
<span class="nb">cd </span>chroma
docker-compose up <span class="nt">-d</span> <span class="nt">--build</span></code></pre></figure>
<p>Chroma comes with a Python and JavaScript client, but underneath it uses a fairly straightforward <a href="https://github.com/chroma-core/chroma/blob/main/chromadb/api/fastapi.py#L46">http interface</a> that talks JSON. The following produces the server version number.</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">endpoint</span> <span class="o">=</span> <span class="n">os</span><span class="p">.</span><span class="n">getenv</span><span class="p">(</span><span class="s">"ENDPOINT"</span><span class="p">,</span> <span class="s">'http://localhost:8000'</span><span class="p">)</span>
<span class="n">api_url</span> <span class="o">=</span> <span class="n">urljoin</span><span class="p">(</span><span class="n">endpoint</span><span class="p">,</span> <span class="s">'/api/v1/'</span><span class="p">)</span>
<span class="n">client</span> <span class="o">=</span> <span class="n">Client</span><span class="p">()</span>
<span class="k">print</span><span class="p">(</span><span class="n">client</span><span class="p">.</span><span class="n">get</span><span class="p">(</span><span class="n">urljoin</span><span class="p">(</span><span class="n">api_url</span><span class="p">,</span> <span class="s">'version'</span><span class="p">)).</span><span class="n">json</span><span class="p">())</span></code></pre></figure>
<p>You can check whether a collection exists by querying <code class="language-plaintext highlighter-rouge">/api/v1/collections/name</code>, but Chroma returns 500s when it doesn’t, so it gets messy. It also seems to allow you to refer to the collection by name and ID, but not in all APIs, so we need the ID anyway. Let’s get it either from <code class="language-plaintext highlighter-rouge">collections</code> or from the return value of creating a collection.</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">collection_name</span> <span class="o">=</span> <span class="s">"my-collection"</span>
<span class="n">collections</span> <span class="o">=</span> <span class="n">client</span><span class="p">.</span><span class="n">get</span><span class="p">(</span><span class="n">urljoin</span><span class="p">(</span><span class="n">api_url</span><span class="p">,</span> <span class="s">"collections"</span><span class="p">)).</span><span class="n">json</span><span class="p">()</span>
<span class="n">collection</span> <span class="o">=</span> <span class="nb">next</span><span class="p">((</span><span class="n">x</span> <span class="k">for</span> <span class="n">x</span> <span class="ow">in</span> <span class="n">collections</span> <span class="k">if</span> <span class="n">x</span><span class="p">[</span><span class="s">"name"</span><span class="p">]</span> <span class="o">==</span> <span class="n">collection_name</span><span class="p">),</span> <span class="bp">None</span><span class="p">)</span>
<span class="k">if</span> <span class="ow">not</span> <span class="n">collection</span><span class="p">:</span>
<span class="n">collection</span> <span class="o">=</span> <span class="n">client</span><span class="p">.</span><span class="n">post</span><span class="p">(</span>
<span class="n">urljoin</span><span class="p">(</span><span class="n">api_url</span><span class="p">,</span> <span class="s">"collections"</span><span class="p">),</span>
<span class="n">headers</span><span class="o">=</span><span class="n">headers</span><span class="p">,</span>
<span class="n">json</span><span class="o">=</span><span class="p">{</span>
<span class="s">"name"</span><span class="p">:</span> <span class="n">collection_name</span>
<span class="p">},</span>
<span class="p">).</span><span class="n">json</span><span class="p">()</span></code></pre></figure>
<p>Chroma is opinionated in how it likes to receive data with arrays of IDs, embeddings, metadata, etc.</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">vectors</span> <span class="o">=</span> <span class="p">[</span>
<span class="p">{</span>
<span class="s">"id"</span><span class="p">:</span> <span class="s">"d8f940f1-d6c1-4d8e-82c1-488eb7801e57"</span><span class="p">,</span>
<span class="s">"values"</span><span class="p">:</span> <span class="p">[</span><span class="mf">0.1</span><span class="p">,</span> <span class="mf">0.2</span><span class="p">,</span> <span class="mf">0.3</span><span class="p">],</span>
<span class="s">"metadata"</span><span class="p">:</span> <span class="p">{</span><span class="s">"genre"</span><span class="p">:</span> <span class="s">"drama"</span><span class="p">},</span>
<span class="p">},</span>
<span class="p">{</span>
<span class="s">"id"</span><span class="p">:</span> <span class="s">"c47eade8-59b9-4c49-9172-a0ce3d9dd0af"</span><span class="p">,</span>
<span class="s">"values"</span><span class="p">:</span> <span class="p">[</span><span class="mf">0.2</span><span class="p">,</span> <span class="mf">0.3</span><span class="p">,</span> <span class="mf">0.4</span><span class="p">],</span>
<span class="s">"metadata"</span><span class="p">:</span> <span class="p">{</span><span class="s">"genre"</span><span class="p">:</span> <span class="s">"action"</span><span class="p">},</span>
<span class="p">},</span>
<span class="p">]</span>
<span class="n">data</span> <span class="o">=</span> <span class="p">{</span>
<span class="s">"ids"</span><span class="p">:</span> <span class="p">[],</span>
<span class="s">"embeddings"</span><span class="p">:</span> <span class="p">[],</span>
<span class="s">"metadatas"</span><span class="p">:</span> <span class="p">[]</span>
<span class="p">}</span>
<span class="k">for</span> <span class="n">vector</span> <span class="ow">in</span> <span class="n">vectors</span><span class="p">:</span>
<span class="n">data</span><span class="p">[</span><span class="s">"ids"</span><span class="p">].</span><span class="n">append</span><span class="p">(</span><span class="n">vector</span><span class="p">[</span><span class="s">"id"</span><span class="p">])</span>
<span class="n">data</span><span class="p">[</span><span class="s">"embeddings"</span><span class="p">].</span><span class="n">append</span><span class="p">(</span><span class="n">vector</span><span class="p">[</span><span class="s">"values"</span><span class="p">])</span>
<span class="n">data</span><span class="p">[</span><span class="s">"metadatas"</span><span class="p">].</span><span class="n">append</span><span class="p">(</span><span class="n">vector</span><span class="p">[</span><span class="s">"metadata"</span><span class="p">])</span>
<span class="n">client</span><span class="p">.</span><span class="n">post</span><span class="p">(</span><span class="n">urljoin</span><span class="p">(</span><span class="n">api_url</span><span class="p">,</span> <span class="sa">f</span><span class="s">"collections/</span><span class="si">{</span><span class="n">collection</span><span class="p">[</span><span class="s">'id'</span><span class="p">]</span><span class="si">}</span><span class="s">/add"</span><span class="p">),</span> <span class="n">json</span><span class="o">=</span><span class="n">data</span><span class="p">)</span></code></pre></figure>
<p>Search is similar. Chroma handles tokenization, embedding, and indexing automatically, but also does support basic vector search with <code class="language-plaintext highlighter-rouge">query_embeddings</code>.</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">query</span> <span class="o">=</span> <span class="p">{</span>
<span class="s">"query_embeddings"</span><span class="p">:</span> <span class="p">[[</span><span class="mf">0.15</span><span class="p">,</span> <span class="mf">0.12</span><span class="p">,</span> <span class="mf">1.23</span><span class="p">]],</span>
<span class="s">"n_results"</span><span class="p">:</span> <span class="mi">1</span><span class="p">,</span>
<span class="s">"include"</span><span class="p">:[</span><span class="s">"embeddings"</span><span class="p">,</span> <span class="s">"metadatas"</span><span class="p">]</span>
<span class="p">}</span>
<span class="n">results</span> <span class="o">=</span> <span class="n">client</span><span class="p">.</span><span class="n">post</span><span class="p">(</span>
<span class="n">urljoin</span><span class="p">(</span><span class="n">api_url</span><span class="p">,</span> <span class="sa">f</span><span class="s">"collections/</span><span class="si">{</span><span class="n">collection</span><span class="p">[</span><span class="s">'id'</span><span class="p">]</span><span class="si">}</span><span class="s">/query"</span><span class="p">),</span> <span class="n">json</span><span class="o">=</span><span class="n">query</span>
<span class="p">).</span><span class="n">json</span><span class="p">()</span>
<span class="k">print</span><span class="p">(</span><span class="n">results</span><span class="p">)</span></code></pre></figure>
<p>You can see and run a <a href="https://github.com/dblock/vectordb-hello-world/blob/main/src/chroma/hello.py">working sample from here</a>.</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nv">ENDPOINT</span><span class="o">=</span>http://localhost:8000 poetry run ./hello.py
<span class="nv">$ ENDPOINT</span><span class="o">=</span>http://localhost:8000 poetry run ./hello.py
<span class="o">></span> GET http://localhost:8000/api/v1/version
< GET http://localhost:8000/api/v1/version - 200
Chroma 0.4.3
<span class="o">></span> GET http://localhost:8000/api/v1/collections
< GET http://localhost:8000/api/v1/collections - 200
<span class="o">></span> POST http://localhost:8000/api/v1/collections
< POST http://localhost:8000/api/v1/collections - 200
<span class="o">></span> POST http://localhost:8000/api/v1/collections/f5aae9cc-a0c1-4990-9942-7a47542b9f64/add
< POST http://localhost:8000/api/v1/collections/f5aae9cc-a0c1-4990-9942-7a47542b9f64/add - 201
<span class="o">></span> POST http://localhost:8000/api/v1/collections/f5aae9cc-a0c1-4990-9942-7a47542b9f64/query
< POST http://localhost:8000/api/v1/collections/f5aae9cc-a0c1-4990-9942-7a47542b9f64/query - 200
<span class="o">{</span><span class="s1">'ids'</span>: <span class="o">[[</span><span class="s1">'c47eade8-59b9-4c49-9172-a0ce3d9dd0af'</span><span class="o">]]</span>, <span class="s1">'distances'</span>: None, <span class="s1">'metadatas'</span>: <span class="o">[[{</span><span class="s1">'genre'</span>: <span class="s1">'action'</span><span class="o">}]]</span>, <span class="s1">'embeddings'</span>: <span class="o">[[[</span>0.2, 0.3, 0.4]]], <span class="s1">'documents'</span>: None<span class="o">}</span>
<span class="o">></span> DELETE http://localhost:8000/api/v1/collections/my-collection
< DELETE http://localhost:8000/api/v1/collections/my-collection - 200</code></pre></figure>
<h3 id="clickhouse">ClickHouse</h3>
<p><a href="https://clickhouse.com/">ClickHouse</a> is a fast and resource efficient open-source database for real-time apps and analytics. You can <a href="https://clickhouse.com/#getting_started">download a free version</a> or use <a href="https://clickhouse.com/">ClickHouse Cloud</a>.</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash">docker run <span class="nt">-p</span> 9000:9000 <span class="nt">-p</span> 9009:9009 <span class="nt">-p</span> 8123:8123 <span class="nt">--platform</span> linux/amd64 <span class="nt">--ulimit</span> <span class="nv">nofile</span><span class="o">=</span>262144:262144 clickhouse/clickhouse-server</code></pre></figure>
<p>ClickHouse offers an HTTP interface.</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">endpoint</span> <span class="o">=</span> <span class="n">os</span><span class="p">.</span><span class="n">getenv</span><span class="p">(</span><span class="s">"ENDPOINT"</span><span class="p">,</span> <span class="s">"http://localhost:8123"</span><span class="p">)</span>
<span class="n">client</span> <span class="o">=</span> <span class="n">Client</span><span class="p">()</span>
<span class="k">print</span><span class="p">(</span><span class="n">client</span><span class="p">.</span><span class="n">get</span><span class="p">(</span><span class="n">endpoint</span><span class="p">).</span><span class="n">text</span><span class="p">)</span></code></pre></figure>
<p>Create a table with a k-nn index. Note <code class="language-plaintext highlighter-rouge">allow_experimental_annoy_index=1</code> in the query string that turns on the <a href="https://clickhouse.com/docs/en/engines/table-engines/mergetree-family/annindexes">approximate nearest neighbor</a> index feature.</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">client</span><span class="p">.</span><span class="n">post</span><span class="p">(</span><span class="n">endpoint</span><span class="p">,</span> <span class="n">params</span><span class="o">=</span><span class="s">"allow_experimental_annoy_index=1"</span><span class="p">,</span> <span class="n">data</span><span class="o">=</span>
<span class="s">"CREATE TABLE IF NOT EXISTS default.vectors ("</span> \
<span class="s">"id String,"</span> \
<span class="s">"values Array(Float32),"</span> \
<span class="s">"metadata Map(String, String),"</span> \
<span class="s">"CONSTRAINT check_length CHECK length(values) = 3,"</span> \
<span class="s">"INDEX values_index values TYPE annoy GRANULARITY 100"</span> \
<span class="s">") "</span> \
<span class="s">"ENGINE = MergeTree "</span> \
<span class="s">"ORDER BY id"</span>
<span class="p">)</span></code></pre></figure>
<p>Insert some vectors.</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">vectors</span> <span class="o">=</span> <span class="p">[</span>
<span class="p">{</span>
<span class="s">"id"</span><span class="p">:</span> <span class="s">"vec1"</span><span class="p">,</span>
<span class="s">"values"</span><span class="p">:</span> <span class="p">[</span><span class="mf">0.1</span><span class="p">,</span> <span class="mf">0.2</span><span class="p">,</span> <span class="mf">0.3</span><span class="p">],</span>
<span class="s">"metadata"</span><span class="p">:</span> <span class="p">{</span><span class="s">"genre"</span><span class="p">:</span> <span class="s">"drama"</span><span class="p">},</span>
<span class="p">},</span>
<span class="p">{</span>
<span class="s">"id"</span><span class="p">:</span> <span class="s">"vec2"</span><span class="p">,</span>
<span class="s">"values"</span><span class="p">:</span> <span class="p">[</span><span class="mf">0.2</span><span class="p">,</span> <span class="mf">0.3</span><span class="p">,</span> <span class="mf">0.4</span><span class="p">],</span>
<span class="s">"metadata"</span><span class="p">:</span> <span class="p">{</span><span class="s">"genre"</span><span class="p">:</span> <span class="s">"action"</span><span class="p">},</span>
<span class="p">},</span>
<span class="p">]</span>
<span class="k">for</span> <span class="n">vector</span> <span class="ow">in</span> <span class="n">vectors</span><span class="p">:</span>
<span class="n">client</span><span class="p">.</span><span class="n">post</span><span class="p">(</span><span class="n">endpoint</span><span class="p">,</span> <span class="n">data</span><span class="o">=</span>
<span class="sa">f</span><span class="s">"INSERT INTO default.vectors (id, values, metadata) "</span> \
<span class="sa">f</span><span class="s">"VALUES (</span><span class="se">\'</span><span class="si">{</span><span class="n">vector</span><span class="p">[</span><span class="s">'id'</span><span class="p">]</span><span class="si">}</span><span class="se">\'</span><span class="s">, </span><span class="si">{</span><span class="n">vector</span><span class="p">[</span><span class="s">'values'</span><span class="p">]</span><span class="si">}</span><span class="s">, </span><span class="si">{</span><span class="n">vector</span><span class="p">[</span><span class="s">'metadata'</span><span class="p">]</span><span class="si">}</span><span class="s">)"</span>
<span class="p">)</span></code></pre></figure>
<p>Search.</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">results</span> <span class="o">=</span> <span class="n">client</span><span class="p">.</span><span class="n">post</span><span class="p">(</span><span class="n">endpoint</span><span class="p">,</span> <span class="n">data</span><span class="o">=</span>
<span class="s">"SELECT * "</span> \
<span class="s">"FROM default.vectors "</span> \
<span class="s">"WHERE metadata['genre']='action' "</span> \
<span class="s">"ORDER BY L2Distance(values, [0.2, 0.3, 0.4])"</span>
<span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="n">results</span><span class="p">.</span><span class="n">text</span><span class="p">)</span> </code></pre></figure>
<p>You can see and run a <a href="https://github.com/dblock/vectordb-hello-world/blob/main/src/click_house/hello.py">working sample from here</a>.</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash">poetry run ./hello.py
<span class="o">></span> POST http://localhost:8123?allow_experimental_annoy_index<span class="o">=</span>1
CREATE TABLE IF NOT EXISTS default.vectors <span class="o">(</span><span class="nb">id </span>String,values Array<span class="o">(</span>Float32<span class="o">)</span>,metadata Map<span class="o">(</span>String, String<span class="o">)</span>,CONSTRAINT check_length CHECK length<span class="o">(</span>values<span class="o">)</span> <span class="o">=</span> 3,INDEX values_index values TYPE annoy GRANULARITY 100<span class="o">)</span> ENGINE <span class="o">=</span> MergeTree ORDER BY <span class="nb">id</span>
< POST http://localhost:8123?allow_experimental_annoy_index<span class="o">=</span>1 - 200
<span class="o">></span> POST http://localhost:8123
INSERT INTO default.vectors <span class="o">(</span><span class="nb">id</span>, values, metadata<span class="o">)</span> VALUES <span class="o">(</span><span class="s1">'vec1'</span>, <span class="o">[</span>0.1, 0.2, 0.3], <span class="o">{</span><span class="s1">'genre'</span>: <span class="s1">'drama'</span><span class="o">})</span>
< POST http://localhost:8123 - 200
<span class="o">></span> POST http://localhost:8123
INSERT INTO default.vectors <span class="o">(</span><span class="nb">id</span>, values, metadata<span class="o">)</span> VALUES <span class="o">(</span><span class="s1">'vec2'</span>, <span class="o">[</span>0.2, 0.3, 0.4], <span class="o">{</span><span class="s1">'genre'</span>: <span class="s1">'action'</span><span class="o">})</span>
< POST http://localhost:8123 - 200
<span class="o">></span> POST http://localhost:8123
SELECT <span class="k">*</span> FROM default.vectors WHERE metadata[<span class="s1">'genre'</span><span class="o">]=</span><span class="s1">'action'</span> ORDER BY L2Distance<span class="o">(</span>values, <span class="o">[</span>0.2, 0.3, 0.4]<span class="o">)</span>
< POST http://localhost:8123 - 200
vec2 <span class="o">[</span>0.2,0.3,0.4] <span class="o">{</span><span class="s1">'genre'</span>:<span class="s1">'action'</span><span class="o">}</span>
<span class="o">></span> POST http://localhost:8123
DROP TABLE default.vectors
< POST http://localhost:8123 - 200</code></pre></figure>
<h3 id="myscale">MyScale</h3>
<p><a href="https://myscale.com">MyScale</a> performs vector search in SQL, and <a href="https://web.archive.org/web/20230517145148/https://blog.myscale.com/2023/05/17/myscale-outperform-special-vectordb/">claims</a> to outperform other solutions by using a proprietary algorithm called <code class="language-plaintext highlighter-rouge">MSTG</code>. MyScale is built on the open-source ClickHouse, so the code is almost identical, except that one uses <code class="language-plaintext highlighter-rouge">VECTOR INDEX values_index values TYPE MSTG</code>.</p>
<p>Sign up <a href="https://myscale.com">on their website</a> for a test cluster, note the username and password. You can see and run a <a href="https://github.com/dblock/vectordb-hello-world/blob/main/src/my_scale/hello.py">working sample from here</a>.</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nv">USERNAME</span><span class="o">=</span>... <span class="nv">PASSWORD</span><span class="o">=</span>... <span class="nv">ENDPOINT</span><span class="o">=</span>https://...aws.myscale.com:443 poetry run ./hello.py
<span class="o">></span> POST https://...aws.myscale.com
CREATE TABLE IF NOT EXISTS default.vectors <span class="o">(</span><span class="nb">id </span>String,values Array<span class="o">(</span>Float32<span class="o">)</span>,metadata Map<span class="o">(</span>String, String<span class="o">)</span>,CONSTRAINT check_length CHECK length<span class="o">(</span>values<span class="o">)</span> <span class="o">=</span> 3,VECTOR INDEX values_index values TYPE MSTG<span class="o">)</span> ENGINE <span class="o">=</span> MergeTree ORDER BY <span class="nb">id</span>
< POST https://...aws.myscale.com - 200
<span class="o">></span> POST https://...aws.myscale.com
INSERT INTO default.vectors <span class="o">(</span><span class="nb">id</span>, values, metadata<span class="o">)</span> VALUES <span class="o">(</span><span class="s1">'vec1'</span>, <span class="o">[</span>0.1, 0.2, 0.3], <span class="o">{</span><span class="s1">'genre'</span>: <span class="s1">'drama'</span><span class="o">})</span>
< POST https://...aws.myscale.com - 200
<span class="o">></span> POST https://...aws.myscale.com
INSERT INTO default.vectors <span class="o">(</span><span class="nb">id</span>, values, metadata<span class="o">)</span> VALUES <span class="o">(</span><span class="s1">'vec2'</span>, <span class="o">[</span>0.2, 0.3, 0.4], <span class="o">{</span><span class="s1">'genre'</span>: <span class="s1">'action'</span><span class="o">})</span>
< POST https://...aws.myscale.com - 200
<span class="o">></span> POST https://...aws.myscale.com
SELECT <span class="k">*</span> FROM default.vectors WHERE metadata[<span class="s1">'genre'</span><span class="o">]=</span><span class="s1">'action'</span> ORDER BY L2Distance<span class="o">(</span>values, <span class="o">[</span>0.2, 0.3, 0.4]<span class="o">)</span>
< POST https://...aws.myscale.com - 200
vec2 <span class="o">[</span>0.2,0.3,0.4] <span class="o">{</span><span class="s1">'genre'</span>:<span class="s1">'action'</span><span class="o">}</span>
<span class="o">></span> POST https://...aws.myscale.com
DROP TABLE default.vectors
< POST https://...aws.myscale.com - 200</code></pre></figure>
<h3 id="opensearch">OpenSearch</h3>
<p><a href="https://opensearch.org/">OpenSearch</a> is a scalable, flexible, and extensible open-source software suite for search, analytics, and observability applications licensed under Apache 2.0. You can use a managed service, such as <a href="https://aws.amazon.com/opensearch-service/">Amazon OpenSearch</a>, or download and install it locally. I usually do the latter, mostly because it’s trivial, and I can work offline.</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash">docker pull opensearchproject/opensearch:latest
docker run <span class="nt">-d</span> <span class="nt">-p</span> 9200:9200 <span class="nt">-p</span> 9600:9600 <span class="nt">-e</span> <span class="s2">"discovery.type=single-node"</span> opensearchproject/opensearch:latest</code></pre></figure>
<p>Whichever option you choose you get a single endpoint (e.g. “https://localhost:9200”). Locally it uses basic auth and has self-signed SSL certificates, therefore needs <code class="language-plaintext highlighter-rouge">verify=False</code>.</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">endpoint</span> <span class="o">=</span> <span class="s">"https://localhost:9200"</span>
<span class="n">username</span> <span class="o">=</span> <span class="s">"admin"</span>
<span class="n">password</span> <span class="o">=</span> <span class="s">"admin"</span>
<span class="n">auth</span> <span class="o">=</span> <span class="n">BasicAuth</span><span class="p">(</span><span class="n">username</span><span class="o">=</span><span class="n">username</span><span class="p">,</span> <span class="n">password</span><span class="o">=</span><span class="n">password</span><span class="p">)</span>
<span class="n">client</span> <span class="o">=</span> <span class="n">Client</span><span class="p">(</span><span class="n">verify</span><span class="o">=</span><span class="bp">False</span><span class="p">,</span> <span class="n">auth</span><span class="o">=</span><span class="n">auth</span><span class="p">)</span>
<span class="n">headers</span> <span class="o">=</span> <span class="p">{</span>
<span class="s">"Accept"</span><span class="p">:</span> <span class="s">"application/json; charset=utf-8"</span><span class="p">,</span>
<span class="s">"Content-Type"</span><span class="p">:</span> <span class="s">"application/json; charset=utf-8"</span><span class="p">,</span>
<span class="p">}</span></code></pre></figure>
<p>We can get a list of existing indexes. This is a data structure with a ton of useful information, but we’ll make a dictionary out of it, and use it to check whether an index exists.</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">indices</span> <span class="o">=</span> <span class="p">{</span> <span class="n">x</span><span class="p">[</span><span class="s">"index"</span><span class="p">]:</span> <span class="n">x</span> <span class="k">for</span> <span class="n">x</span> <span class="ow">in</span>
<span class="n">client</span><span class="p">.</span><span class="n">get</span><span class="p">(</span>
<span class="n">urljoin</span><span class="p">(</span><span class="n">endpoint</span><span class="p">,</span> <span class="s">"/_cat/indices"</span><span class="p">),</span>
<span class="n">headers</span><span class="o">=</span><span class="n">headers</span>
<span class="p">).</span><span class="n">json</span><span class="p">()</span>
<span class="p">}</span></code></pre></figure>
<p>If an index doesn’t exist, we can create one. The syntax enables k-nn vector search, and include so-called property mappings. It will also need to have a fixed number of dimensions for our vectors.</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">client</span><span class="p">.</span><span class="n">put</span><span class="p">(</span>
<span class="n">urljoin</span><span class="p">(</span><span class="n">endpoint</span><span class="p">,</span> <span class="sa">f</span><span class="s">"/</span><span class="si">{</span><span class="n">index_name</span><span class="si">}</span><span class="s">"</span><span class="p">),</span>
<span class="n">headers</span><span class="o">=</span><span class="n">headers</span><span class="p">,</span>
<span class="n">json</span><span class="o">=</span><span class="p">{</span>
<span class="s">"settings"</span><span class="p">:</span> <span class="p">{</span><span class="s">"index.knn"</span><span class="p">:</span> <span class="bp">True</span><span class="p">},</span>
<span class="s">"mappings"</span><span class="p">:</span> <span class="p">{</span>
<span class="s">"properties"</span><span class="p">:</span> <span class="p">{</span>
<span class="s">"values"</span><span class="p">:</span> <span class="p">{</span>
<span class="s">"type"</span><span class="p">:</span> <span class="s">"knn_vector"</span><span class="p">,</span>
<span class="s">"dimension"</span><span class="p">:</span> <span class="mi">3</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="p">)</span></code></pre></figure>
<p>Indexing data can be done document-by-document or via the bulk API, which requires newline-delimited JSON. We start with some data.</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">vectors</span> <span class="o">=</span> <span class="p">[</span>
<span class="p">{</span>
<span class="s">"id"</span><span class="p">:</span> <span class="s">"vec1"</span><span class="p">,</span>
<span class="s">"values"</span><span class="p">:</span> <span class="p">[</span><span class="mf">0.1</span><span class="p">,</span> <span class="mf">0.2</span><span class="p">,</span> <span class="mf">0.3</span><span class="p">],</span>
<span class="s">"metadata"</span><span class="p">:</span> <span class="p">{</span><span class="s">"genre"</span><span class="p">:</span> <span class="s">"drama"</span><span class="p">},</span>
<span class="p">},</span>
<span class="p">{</span>
<span class="s">"id"</span><span class="p">:</span> <span class="s">"vec2"</span><span class="p">,</span>
<span class="s">"values"</span><span class="p">:</span> <span class="p">[</span><span class="mf">0.2</span><span class="p">,</span> <span class="mf">0.3</span><span class="p">,</span> <span class="mf">0.4</span><span class="p">],</span>
<span class="s">"metadata"</span><span class="p">:</span> <span class="p">{</span><span class="s">"genre"</span><span class="p">:</span> <span class="s">"action"</span><span class="p">},</span>
<span class="p">},</span>
<span class="p">]</span></code></pre></figure>
<p>You can insert document-by-document.</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="k">for</span> <span class="n">vector</span> <span class="ow">in</span> <span class="n">vectors</span><span class="p">:</span>
<span class="n">client</span><span class="p">.</span><span class="n">post</span><span class="p">(</span>
<span class="n">urljoin</span><span class="p">(</span><span class="n">endpoint</span><span class="p">,</span> <span class="sa">f</span><span class="s">"/</span><span class="si">{</span><span class="n">index_name</span><span class="si">}</span><span class="s">/_doc/</span><span class="si">{</span><span class="n">vector</span><span class="p">[</span><span class="s">'id'</span><span class="p">]</span><span class="si">}</span><span class="s">"</span><span class="p">),</span>
<span class="n">headers</span><span class="o">=</span><span class="n">headers</span><span class="p">,</span>
<span class="n">json</span><span class="o">=</span><span class="n">vector</span>
<span class="p">)</span></code></pre></figure>
<p>Or bulk insert, which asks to separate document IDs from document data, so I purposely wrote it in a way that starts with combined vector documents that include IDs, and generates JSON that the bulk API accepts as a transform.</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">data</span> <span class="o">=</span> <span class="s">""</span>
<span class="k">for</span> <span class="n">vector</span> <span class="ow">in</span> <span class="n">vectors</span><span class="p">:</span>
<span class="n">data</span> <span class="o">+=</span> <span class="n">json</span><span class="p">.</span><span class="n">dumps</span><span class="p">({</span> <span class="s">"index"</span><span class="p">:</span> <span class="p">{</span><span class="s">"_index"</span><span class="p">:</span> <span class="n">index_name</span><span class="p">,</span> <span class="s">"_id"</span><span class="p">:</span> <span class="n">vector</span><span class="p">[</span><span class="s">"id"</span><span class="p">]}</span> <span class="p">})</span> <span class="o">+</span> <span class="s">"</span><span class="se">\n</span><span class="s">"</span>
<span class="n">data</span> <span class="o">+=</span> <span class="n">json</span><span class="p">.</span><span class="n">dumps</span><span class="p">({</span><span class="n">i</span><span class="p">:</span> <span class="n">vector</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="n">vector</span> <span class="k">if</span> <span class="n">i</span> <span class="o">!=</span> <span class="s">"id"</span><span class="p">})</span> <span class="o">+</span> <span class="s">"</span><span class="se">\n</span><span class="s">"</span>
<span class="n">client</span><span class="p">.</span><span class="n">post</span><span class="p">(</span><span class="n">urljoin</span><span class="p">(</span><span class="n">endpoint</span><span class="p">,</span> <span class="s">"/_bulk"</span><span class="p">),</span> <span class="n">headers</span><span class="o">=</span><span class="n">headers</span><span class="p">,</span> <span class="n">data</span><span class="o">=</span><span class="n">data</span><span class="p">)</span></code></pre></figure>
<p>Search for data.</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">query</span> <span class="o">=</span> <span class="p">{</span>
<span class="s">"query"</span><span class="p">:</span> <span class="p">{</span>
<span class="s">"knn"</span><span class="p">:</span> <span class="p">{</span>
<span class="s">"values"</span><span class="p">:</span> <span class="p">{</span>
<span class="s">"vector"</span><span class="p">:</span> <span class="p">[</span><span class="mf">0.1</span><span class="p">,</span> <span class="mf">0.2</span><span class="p">,</span> <span class="mf">0.3</span><span class="p">],</span>
<span class="s">"k"</span><span class="p">:</span> <span class="mi">1</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="n">results</span> <span class="o">=</span> <span class="n">client</span><span class="p">.</span><span class="n">post</span><span class="p">(</span>
<span class="n">urljoin</span><span class="p">(</span><span class="n">endpoint</span><span class="p">,</span> <span class="sa">f</span><span class="s">"/</span><span class="si">{</span><span class="n">index_name</span><span class="si">}</span><span class="s">/_search"</span><span class="p">),</span>
<span class="n">headers</span><span class="o">=</span><span class="n">headers</span><span class="p">,</span>
<span class="n">json</span><span class="o">=</span><span class="n">query</span>
<span class="p">).</span><span class="n">json</span><span class="p">()</span></code></pre></figure>
<p>You can see and run a <a href="https://github.com/dblock/vectordb-hello-world/blob/main/src/open_search/hello.py">working sample from here</a>.</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nv">USERNAME</span><span class="o">=</span>admin <span class="nv">PASSWORD</span><span class="o">=</span>admin <span class="nv">ENDPOINT</span><span class="o">=</span>https://localhost:9200 poetry run src/open_search/hello.py
<span class="o">></span> GET https://localhost:9200/_cat/indices
< GET https://localhost:9200/_cat/indices - 200
<span class="o">></span> PUT https://localhost:9200/my-index
< PUT https://localhost:9200/my-index - 200
<span class="o">></span> POST https://localhost:9200/_bulk
< POST https://localhost:9200/_bulk - 200
<span class="o">></span> POST https://localhost:9200/my-index/_search
< POST https://localhost:9200/my-index/_search - 200
<span class="o">{</span><span class="s1">'total'</span>: <span class="o">{</span><span class="s1">'value'</span>: 1, <span class="s1">'relation'</span>: <span class="s1">'eq'</span><span class="o">}</span>, <span class="s1">'max_score'</span>: 0.97087383, <span class="s1">'hits'</span>: <span class="o">[{</span><span class="s1">'_index'</span>: <span class="s1">'my-index'</span>, <span class="s1">'_id'</span>: <span class="s1">'vec1'</span>, <span class="s1">'_score'</span>: 0.97087383, <span class="s1">'_source'</span>: <span class="o">{</span><span class="s1">'index'</span>: <span class="o">{</span><span class="s1">'_index'</span>: <span class="s1">'my-index'</span>, <span class="s1">'_id'</span>: <span class="s1">'vec2'</span><span class="o">}</span>, <span class="s1">'values'</span>: <span class="o">[</span>0.2, 0.3, 0.4], <span class="s1">'metadata'</span>: <span class="o">{</span><span class="s1">'genre'</span>: <span class="s1">'action'</span><span class="o">}}}]}</span></code></pre></figure>
<h3 id="pgvector">pgVector</h3>
<p><a href="https://github.com/pgvector/pgvector">pgVector</a> adds vector similarity search to open-source Postgres. You can use a local docker installation from <a href="https://hub.docker.com/r/ankane/pgvector">ankane/pgvector</a>, or a <a href="https://github.com/pgvector/pgvector#hosted-postgres">managed service</a>.</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash">docker pull ankane/pgvector or https://github.com/pgvector/pgvector/issues/54 <span class="k">for </span>cloud providers
docker run <span class="nt">-e</span> <span class="nv">POSTGRES_PASSWORD</span><span class="o">=</span>password <span class="nt">-p</span> 5433:5432 ankane/pgvector</code></pre></figure>
<p>PostgreSQL speaks its own message-based protocol, and queries are made in SQL, which is not HTTP, hence we’re going to use <a href="https://github.com/MagicStack/asyncpg">asyncpg</a>.</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">database</span> <span class="o">=</span> <span class="s">"vectors"</span>
<span class="n">conn</span> <span class="o">=</span> <span class="k">await</span> <span class="n">asyncpg</span><span class="p">.</span><span class="n">connect</span><span class="p">(</span><span class="n">database</span><span class="o">=</span><span class="s">"template1"</span><span class="p">)</span>
<span class="n">onn</span><span class="p">.</span><span class="n">execute</span><span class="p">(</span><span class="sa">f</span><span class="s">"CREATE DATABASE </span><span class="se">\"</span><span class="si">{</span><span class="n">database</span><span class="si">}</span><span class="se">\"</span><span class="s">"</span><span class="p">)</span></code></pre></figure>
<p>Enable vector extensions on the index.</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="k">await</span> <span class="n">conn</span><span class="p">.</span><span class="n">execute</span><span class="p">(</span><span class="sa">f</span><span class="s">"CREATE EXTENSION vector"</span><span class="p">)</span>
<span class="k">await</span> <span class="n">pgvector</span><span class="p">.</span><span class="n">asyncpg</span><span class="p">.</span><span class="n">register_vector</span><span class="p">(</span><span class="n">conn</span><span class="p">)</span></code></pre></figure>
<p>Create a schema with a custom primary key, a 3-dimensional vector, and some JSON metadata.</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="k">await</span> <span class="n">conn</span><span class="p">.</span><span class="n">execute</span><span class="p">(</span>
<span class="sa">f</span><span class="s">"CREATE TABLE vectors (id text PRIMARY KEY, values vector(3), metadata JSONB)"</span>
<span class="p">)</span></code></pre></figure>
<p>Insert vectors.</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">vectors</span> <span class="o">=</span> <span class="p">[</span>
<span class="p">{</span>
<span class="s">"id"</span><span class="p">:</span> <span class="s">"vec1"</span><span class="p">,</span>
<span class="s">"values"</span><span class="p">:</span> <span class="p">[</span><span class="mf">0.1</span><span class="p">,</span> <span class="mf">0.2</span><span class="p">,</span> <span class="mf">0.3</span><span class="p">],</span>
<span class="s">"metadata"</span><span class="p">:</span> <span class="p">{</span><span class="s">"genre"</span><span class="p">:</span> <span class="s">"drama"</span><span class="p">},</span>
<span class="p">},</span>
<span class="p">{</span>
<span class="s">"id"</span><span class="p">:</span> <span class="s">"vec2"</span><span class="p">,</span>
<span class="s">"values"</span><span class="p">:</span> <span class="p">[</span><span class="mf">0.2</span><span class="p">,</span> <span class="mf">0.3</span><span class="p">,</span> <span class="mf">0.4</span><span class="p">],</span>
<span class="s">"metadata"</span><span class="p">:</span> <span class="p">{</span><span class="s">"genre"</span><span class="p">:</span> <span class="s">"action"</span><span class="p">},</span>
<span class="p">},</span>
<span class="p">]</span>
<span class="k">for</span> <span class="n">vector</span> <span class="ow">in</span> <span class="n">vectors</span><span class="p">:</span>
<span class="n">q</span> <span class="o">=</span> <span class="sa">f</span><span class="s">"INSERT INTO vectors (id, values, metadata) VALUES($1, $2, $3)"</span>
<span class="k">await</span> <span class="n">conn</span><span class="p">.</span><span class="n">execute</span><span class="p">(</span><span class="n">q</span><span class="p">,</span>
<span class="n">vector</span><span class="p">[</span><span class="s">'id'</span><span class="p">],</span>
<span class="n">vector</span><span class="p">[</span><span class="s">'values'</span><span class="p">],</span>
<span class="n">json</span><span class="p">.</span><span class="n">dumps</span><span class="p">(</span><span class="n">vector</span><span class="p">[</span><span class="s">'metadata'</span><span class="p">])</span>
<span class="p">)</span></code></pre></figure>
<p>Search. In the example below we filter by <code class="language-plaintext highlighter-rouge">genre</code>.</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">q</span> <span class="o">=</span> <span class="s">"SELECT * FROM vectors WHERE metadata->>'genre'='action' ORDER BY values <-> '[0.2,0.1,0.5]'"</span>
<span class="n">results</span> <span class="o">=</span> <span class="k">await</span> <span class="n">conn</span><span class="p">.</span><span class="n">fetch</span><span class="p">(</span><span class="n">q</span><span class="p">)</span>
<span class="k">for</span> <span class="n">result</span> <span class="ow">in</span> <span class="n">results</span><span class="p">:</span>
<span class="k">print</span><span class="p">(</span><span class="sa">f</span><span class="s">"</span><span class="si">{</span><span class="n">result</span><span class="si">}</span><span class="s"> (</span><span class="si">{</span><span class="n">json</span><span class="p">.</span><span class="n">loads</span><span class="p">(</span><span class="n">result</span><span class="p">[</span><span class="s">'metadata'</span><span class="p">])[</span><span class="s">'genre'</span><span class="p">]</span><span class="si">}</span><span class="s">)"</span><span class="p">)</span></code></pre></figure>
<p>Finally, drop this database.</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="k">await</span> <span class="n">conn</span><span class="p">.</span><span class="n">execute</span><span class="p">(</span><span class="sa">f</span><span class="s">"DROP DATABASE </span><span class="se">\"</span><span class="si">{</span><span class="n">database</span><span class="si">}</span><span class="se">\"</span><span class="s">"</span><span class="p">)</span></code></pre></figure>
<p>You can see and run a <a href="https://github.com/dblock/vectordb-hello-world/blob/main/src/pg_vector/hello.py">working sample from here</a>.</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nv">PGPORT</span><span class="o">=</span>5433 <span class="nv">PGUSER</span><span class="o">=</span>postgres <span class="nv">PGPASSWORD</span><span class="o">=</span>password poetry run ./hello.py
<Record <span class="nb">id</span><span class="o">=</span><span class="s1">'vec2'</span> <span class="nv">values</span><span class="o">=</span>array<span class="o">([</span>0.2, 0.3, 0.4], <span class="nv">dtype</span><span class="o">=</span>float32<span class="o">)</span> <span class="nv">metadata</span><span class="o">=</span><span class="s1">'{"genre": "action"}'</span><span class="o">></span> <span class="o">(</span>action<span class="o">)</span></code></pre></figure>
<h3 id="pinecone">Pinecone</h3>
<p>The <a href="https://www.pinecone.io/">Pinecone vector database</a> is easy to build high-performance vector search applications with, developer-friendly, fully managed, and scalable without infrastructure hassles.</p>
<p>Conceptually it has indexes (which are really databases, and were probably originally called as such as the API has <code class="language-plaintext highlighter-rouge">/databases</code> in it). After signing up to Pinecone you get a regional endpoint and a project ID. These form a controller URI (e.g. <code class="language-plaintext highlighter-rouge">https://controller.us-west4-gcp-free.pinecone.io/</code>) for database operations. After you create an index, that gets its own URI that combines the index name (e.g. “my-index”) and a project ID (e.g. <code class="language-plaintext highlighter-rouge">https://my-index-c7556fa.svc.us-west4-gcp-free.pinecone.io</code>). It’s not quite serverless, as you do have to reason about <a href="https://docs.pinecone.io/docs/indexes">pods</a>.</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="kn">from</span> <span class="nn">urllib.parse</span> <span class="kn">import</span> <span class="n">urljoin</span><span class="p">,</span> <span class="n">urlparse</span>
<span class="n">endpoint</span> <span class="o">=</span> <span class="n">urlparse</span><span class="p">(</span><span class="s">"https://us-west4-gcp-free.pinecone.io"</span><span class="p">)</span>
<span class="n">project_id</span> <span class="o">=</span> <span class="n">os</span><span class="p">.</span><span class="n">environ</span><span class="p">[</span><span class="s">"PROJECT_ID"</span><span class="p">]</span>
<span class="n">controller_endpoint</span> <span class="o">=</span> <span class="n">endpoint</span><span class="p">.</span><span class="n">_replace</span><span class="p">(</span><span class="n">netloc</span><span class="o">=</span><span class="sa">f</span><span class="s">"controller.</span><span class="si">{</span><span class="n">endpoint</span><span class="p">.</span><span class="n">netloc</span><span class="si">}</span><span class="s">"</span><span class="p">).</span><span class="n">geturl</span><span class="p">()</span>
<span class="n">service_endpoint</span> <span class="o">=</span> <span class="n">endpoint</span><span class="p">.</span><span class="n">_replace</span><span class="p">(</span><span class="n">netloc</span><span class="o">=</span><span class="sa">f</span><span class="s">'my-index-</span><span class="si">{</span><span class="n">project_id</span><span class="si">}</span><span class="s">.svc.</span><span class="si">{</span><span class="n">endpoint</span><span class="p">.</span><span class="n">netloc</span><span class="si">}</span><span class="s">'</span><span class="p">).</span><span class="n">geturl</span><span class="p">()</span></code></pre></figure>
<p>Authentication is performed using a required API key.</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="kn">from</span> <span class="nn">httpx</span> <span class="kn">import</span> <span class="n">Client</span>
<span class="n">api_key</span> <span class="o">=</span> <span class="n">os</span><span class="p">.</span><span class="n">environ</span><span class="p">[</span><span class="s">"API_KEY"</span><span class="p">]</span>
<span class="n">client</span> <span class="o">=</span> <span class="n">Client</span><span class="p">()</span>
<span class="n">headers</span> <span class="o">=</span> <span class="p">{</span>
<span class="s">"Accept"</span><span class="p">:</span> <span class="s">"application/json; charset=utf-8"</span><span class="p">,</span>
<span class="s">"Content-Type"</span><span class="p">:</span> <span class="s">"application/json; charset=utf-8"</span><span class="p">,</span>
<span class="s">"Api-Key"</span><span class="p">:</span> <span class="n">api_key</span><span class="p">,</span>
<span class="p">}</span></code></pre></figure>
<p>We can get a list of existing indexes. This is just a list of names, useful to check whether an index exists.</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">indices</span> <span class="o">=</span> <span class="n">client</span><span class="p">.</span><span class="n">get</span><span class="p">(</span>
<span class="n">urljoin</span><span class="p">(</span><span class="n">controller_endpoint</span><span class="p">,</span> <span class="s">"/databases"</span><span class="p">),</span>
<span class="n">headers</span><span class="o">=</span><span class="n">headers</span>
<span class="p">).</span><span class="n">json</span><span class="p">()</span></code></pre></figure>
<p>If an index doesn’t exist, we can create one. It will need to have a fixed number of dimensions for our vectors.</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">client</span><span class="p">.</span><span class="n">post</span><span class="p">(</span>
<span class="n">urljoin</span><span class="p">(</span><span class="n">controller_endpoint</span><span class="p">,</span> <span class="s">"/databases"</span><span class="p">),</span>
<span class="n">headers</span><span class="o">=</span><span class="n">headers</span><span class="p">,</span>
<span class="n">json</span><span class="o">=</span><span class="p">{</span><span class="s">"name"</span><span class="p">:</span> <span class="n">index_name</span><span class="p">,</span> <span class="s">"dimension"</span><span class="p">:</span> <span class="mi">3</span><span class="p">},</span>
<span class="p">)</span></code></pre></figure>
<p>Index data.</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">vectors</span> <span class="o">=</span> <span class="p">[</span>
<span class="p">{</span>
<span class="s">"id"</span><span class="p">:</span> <span class="s">"vec1"</span><span class="p">,</span>
<span class="s">"values"</span><span class="p">:</span> <span class="p">[</span><span class="mf">0.1</span><span class="p">,</span> <span class="mf">0.2</span><span class="p">,</span> <span class="mf">0.3</span><span class="p">],</span>
<span class="s">"metadata"</span><span class="p">:</span> <span class="p">{</span><span class="s">"genre"</span><span class="p">:</span> <span class="s">"drama"</span><span class="p">},</span>
<span class="p">},</span>
<span class="p">{</span>
<span class="s">"id"</span><span class="p">:</span> <span class="s">"vec2"</span><span class="p">,</span>
<span class="s">"values"</span><span class="p">:</span> <span class="p">[</span><span class="mf">0.2</span><span class="p">,</span> <span class="mf">0.3</span><span class="p">,</span> <span class="mf">0.4</span><span class="p">],</span>
<span class="s">"metadata"</span><span class="p">:</span> <span class="p">{</span><span class="s">"genre"</span><span class="p">:</span> <span class="s">"action"</span><span class="p">},</span>
<span class="p">},</span>
<span class="p">]</span>
<span class="n">client</span><span class="p">.</span><span class="n">post</span><span class="p">(</span>
<span class="n">urljoin</span><span class="p">(</span><span class="n">service_endpoint</span><span class="p">,</span> <span class="s">"/vectors/upsert"</span><span class="p">),</span>
<span class="n">headers</span><span class="o">=</span><span class="n">headers</span><span class="p">,</span>
<span class="n">json</span><span class="o">=</span><span class="p">{</span><span class="s">"vectors"</span><span class="p">:</span> <span class="n">vectors</span><span class="p">,</span> <span class="s">"namespace"</span><span class="p">:</span> <span class="s">"namespace"</span><span class="p">},</span>
<span class="p">)</span></code></pre></figure>
<p>Search for this vector data.</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">results</span> <span class="o">=</span> <span class="n">client</span><span class="p">.</span><span class="n">post</span><span class="p">(</span>
<span class="n">urljoin</span><span class="p">(</span><span class="n">service_endpoint</span><span class="p">,</span> <span class="s">"/query"</span><span class="p">),</span>
<span class="n">headers</span><span class="o">=</span><span class="n">headers</span><span class="p">,</span>
<span class="n">json</span><span class="o">=</span><span class="p">{</span>
<span class="s">"vector"</span><span class="p">:</span> <span class="p">[</span><span class="mf">0.1</span><span class="p">,</span> <span class="mf">0.2</span><span class="p">,</span> <span class="mf">0.3</span><span class="p">],</span>
<span class="s">"top_k"</span><span class="p">:</span> <span class="mi">1</span><span class="p">,</span>
<span class="s">"namespace"</span><span class="p">:</span> <span class="s">"namespace"</span><span class="p">,</span>
<span class="s">"includeMetadata"</span><span class="p">:</span> <span class="bp">True</span><span class="p">,</span>
<span class="p">},</span>
<span class="p">).</span><span class="n">json</span><span class="p">()</span></code></pre></figure>
<p>You can see and run a <a href="https://github.com/dblock/vectordb-hello-world/blob/main/src/pinecone/hello.py">working sample from here</a>.</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nv">API_KEY</span><span class="o">=</span>... <span class="nv">PROJECT_ID</span><span class="o">=</span>... <span class="nv">ENDPOINT</span><span class="o">=</span>https://us-west4-gcp-free.pinecone.io poetry run src/pinecone/hello.py
<span class="o">></span> GET https://controller.us-west4-gcp-free.pinecone.io/databases
< GET https://controller.us-west4-gcp-free.pinecone.io/databases - 200
<span class="o">></span> POST https://my-index-c7556fa.svc.us-west4-gcp-free.pinecone.io/vectors/upsert
< POST https://my-index-c7556fa.svc.us-west4-gcp-free.pinecone.io/vectors/upsert - 200
<span class="o">></span> POST https://my-index-c7556fa.svc.us-west4-gcp-free.pinecone.io/query
< POST https://my-index-c7556fa.svc.us-west4-gcp-free.pinecone.io/query - 200
<span class="o">{</span><span class="s1">'results'</span>: <span class="o">[]</span>, <span class="s1">'matches'</span>: <span class="o">[{</span><span class="s1">'id'</span>: <span class="s1">'vec1'</span>, <span class="s1">'score'</span>: 0.999999881, <span class="s1">'values'</span>: <span class="o">[]</span>, <span class="s1">'metadata'</span>: <span class="o">{</span><span class="s1">'genre'</span>: <span class="s1">'drama'</span><span class="o">}}]</span>, <span class="s1">'namespace'</span>: <span class="s1">'namespace'</span><span class="o">}</span></code></pre></figure>
<h3 id="qdrant">Qdrant</h3>
<p><a href="https://qdrant.tech/">Qdrant</a> is a similarity vector search engine designed for a wide range of applications, including recommendation systems, image search, and natural language processing. It is scalable and allows dynamic updates to the index. It is particularly suitable for scenarios where the vector data is constantly evolving and vectors may be modified without interrupting the search functionality. Qdrant is licensed under Apache 2.0.</p>
<p>Qdrant is built upon a concept of indexes, where vectors are organized and stored in “collections” for quick retrieval. Currently, it only supports HNSW (Hierarchical Navigable Small World) as vector index.</p>
<p>After you sign up at Qdrant Cloud Services, create a new free tier Qdrant Cluster with authentication. Note your cluster URL and API key. The endpoint will have the following format <code class="language-plaintext highlighter-rouge">https://my-cluster.cloud.qdrant.io:6333/</code>.</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">client</span> <span class="o">=</span> <span class="n">Client</span><span class="p">()</span>
<span class="n">endpoint</span> <span class="o">=</span> <span class="n">os</span><span class="p">.</span><span class="n">environ</span><span class="p">[</span><span class="s">"ENDPOINT"</span><span class="p">]</span>
<span class="n">api_key</span> <span class="o">=</span> <span class="n">os</span><span class="p">.</span><span class="n">environ</span><span class="p">[</span><span class="s">"API_KEY"</span><span class="p">]</span>
<span class="n">headers</span> <span class="o">=</span> <span class="p">{</span>
<span class="s">"Accept"</span><span class="p">:</span> <span class="s">"application/json; charset=utf-8"</span><span class="p">,</span>
<span class="s">"Content-Type"</span><span class="p">:</span> <span class="s">"application/json; charset=utf-8"</span><span class="p">,</span>
<span class="s">"api-key"</span><span class="p">:</span> <span class="n">api_key</span>
<span class="p">}</span></code></pre></figure>
<p>We can create an index in collections.</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">index_name</span> <span class="o">=</span> <span class="s">"my-index"</span>
<span class="n">index</span> <span class="o">=</span> <span class="p">{</span>
<span class="s">"vectors"</span><span class="p">:</span> <span class="p">{</span>
<span class="s">"size"</span><span class="p">:</span> <span class="mi">3</span><span class="p">,</span>
<span class="s">"distance"</span><span class="p">:</span> <span class="s">"Cosine"</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="n">vectors</span> <span class="o">=</span> <span class="p">[</span>
<span class="p">{</span>
<span class="s">"id"</span><span class="p">:</span> <span class="mi">1</span><span class="p">,</span>
<span class="s">"vector"</span><span class="p">:</span> <span class="p">[</span><span class="mf">0.1</span><span class="p">,</span> <span class="mf">0.2</span><span class="p">,</span> <span class="mf">0.3</span><span class="p">],</span>
<span class="s">"payload"</span><span class="p">:</span> <span class="p">{</span>
<span class="s">"genre"</span><span class="p">:</span> <span class="s">"drama"</span>
<span class="p">}</span>
<span class="p">},</span>
<span class="p">{</span>
<span class="s">"id"</span><span class="p">:</span> <span class="mi">2</span><span class="p">,</span>
<span class="s">"vector"</span><span class="p">:</span> <span class="p">[</span><span class="mf">0.2</span><span class="p">,</span> <span class="mf">0.3</span><span class="p">,</span> <span class="mf">0.4</span><span class="p">],</span>
<span class="s">"payload"</span><span class="p">:</span> <span class="p">{</span>
<span class="s">"genre"</span><span class="p">:</span> <span class="s">"action"</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="p">]</span>
<span class="n">payload</span> <span class="o">=</span> <span class="p">{</span><span class="s">"points"</span><span class="p">:</span> <span class="n">vectors</span><span class="p">}</span>
<span class="n">client</span><span class="p">.</span><span class="n">get</span><span class="p">(</span><span class="n">urljoin</span><span class="p">(</span><span class="n">endpoint</span><span class="p">,</span> <span class="s">"collections"</span><span class="p">),</span> <span class="n">headers</span><span class="o">=</span><span class="n">headers</span><span class="p">),</span>
<span class="n">client</span><span class="p">.</span><span class="n">put</span><span class="p">(</span>
<span class="n">urljoin</span><span class="p">(</span><span class="n">endpoint</span><span class="p">,</span> <span class="sa">f</span><span class="s">"/collections/</span><span class="si">{</span><span class="n">index_name</span><span class="si">}</span><span class="s">"</span><span class="p">),</span>
<span class="n">json</span><span class="o">=</span><span class="n">index</span><span class="p">,</span>
<span class="n">headers</span><span class="o">=</span><span class="n">headers</span>
<span class="p">)</span></code></pre></figure>
<p>Upload some vectors.</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">client</span><span class="p">.</span><span class="n">put</span><span class="p">(</span>
<span class="n">urljoin</span><span class="p">(</span><span class="n">endpoint</span><span class="p">,</span> <span class="sa">f</span><span class="s">"/collections/</span><span class="si">{</span><span class="n">index_name</span><span class="si">}</span><span class="s">/points?wait=true"</span><span class="p">),</span>
<span class="n">data</span><span class="o">=</span><span class="n">dumps</span><span class="p">(</span><span class="n">payload</span><span class="p">),</span> <span class="n">headers</span><span class="o">=</span><span class="n">headers</span><span class="p">)</span></code></pre></figure>
<p>Search.</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">query</span> <span class="o">=</span> <span class="s">'{"vector": [0.1,0.2,0.3], "limit": 1}'</span>
<span class="n">response</span> <span class="o">=</span> <span class="n">client</span><span class="p">.</span><span class="n">post</span><span class="p">(</span>
<span class="n">urljoin</span><span class="p">(</span><span class="n">endpoint</span><span class="p">,</span> <span class="sa">f</span><span class="s">"/collections/</span><span class="si">{</span><span class="n">index_name</span><span class="si">}</span><span class="s">/points/search"</span><span class="p">),</span>
<span class="n">data</span><span class="o">=</span><span class="n">query</span><span class="p">,</span> <span class="n">headers</span><span class="o">=</span><span class="n">headers</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="n">response</span><span class="p">.</span><span class="n">json</span><span class="p">())</span></code></pre></figure>
<p>It is easy to delete all vectors in an index.</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">client</span><span class="p">.</span><span class="n">delete</span><span class="p">(</span>
<span class="n">urljoin</span><span class="p">(</span><span class="n">endpoint</span><span class="p">,</span> <span class="sa">f</span><span class="s">"/collections/</span><span class="si">{</span><span class="n">index_name</span><span class="si">}</span><span class="s">"</span><span class="p">),</span>
<span class="n">headers</span><span class="o">=</span><span class="n">headers</span>
<span class="p">)</span></code></pre></figure>
<p>You can see and run a <a href="https://github.com/dblock/vectordb-hello-world/blob/main/src/qdrant/hello.py">working sample from here</a>.</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nv">API_KEY</span><span class="o">=</span>... <span class="nv">ENDPOINT</span><span class="o">=</span>https://my-cluster.cloud.qdrant.io:6333 poetry run src/qdrant/hello.py
<span class="o">></span> GET https://my-cluster.cloud.qdrant.io:6333/collections
< GET https://my-cluster.cloud.qdrant.io:6333/collections - 200
<span class="o">></span> PUT https://my-cluster.cloud.qdrant.io:6333/collections/my-index
< PUT https://my-cluster.cloud.qdrant.io:6333/collections/my-index - 200
<span class="o">></span> PUT https://my-cluster.cloud.qdrant.io:6333/collections/my-index/points?wait<span class="o">=</span><span class="nb">true</span>
< PUT https://my-cluster.cloud.qdrant.io:6333/collections/my-index/points?wait<span class="o">=</span><span class="nb">true</span> - 200
<span class="o">></span> POST https://my-cluster.cloud.qdrant.io:6333/collections/my-index/points/search
< POST https://my-cluster.cloud.qdrant.io:6333/collections/my-index/points/search - 200
<span class="o">{</span><span class="s1">'result'</span>: <span class="o">[{</span><span class="s1">'id'</span>: 1, <span class="s1">'version'</span>: 0, <span class="s1">'score'</span>: 0.9999998, <span class="s1">'payload'</span>: None, <span class="s1">'vector'</span>: None<span class="o">}]</span>, <span class="s1">'status'</span>: <span class="s1">'ok'</span>, <span class="s1">'time'</span>: 0.000117235<span class="o">}</span>
<span class="o">></span> DELETE https://my-cluster.cloud.qdrant.io:6333/collections/my-index
< DELETE https://my-cluster.cloud.qdrant.io:6333/collections/my-index - 200</code></pre></figure>
<h3 id="redis">Redis</h3>
<p><a href="https://redis.io/">Redis</a> is a fast, opinionated, open-source database. Its <a href="https://redis.io/docs/interact/search-and-query/search/vectors/">similarity vector search</a> comes with <code class="language-plaintext highlighter-rouge">FLAT</code> and <code class="language-plaintext highlighter-rouge">HNSW</code> indexing methods (field types). Redis is licensed under BSD.</p>
<p>I prefer to run Redis locally in Docker with <code class="language-plaintext highlighter-rouge">docker run -p 6379:6379 redislabs/redisearch:latest</code>, but managed service options with free tiers also <a href="https://redis.com/">exist</a>.</p>
<p>Redis speaks <a href="https://redis.io/docs/reference/protocol-spec/">RESP</a>, which is not HTTP, hence we’re going to use <a href="https://github.com/redis/redis-py">redis-py</a>.</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">r</span> <span class="o">=</span> <span class="n">Redis</span><span class="p">(</span><span class="n">host</span><span class="o">=</span><span class="s">'localhost'</span><span class="p">,</span> <span class="n">port</span><span class="o">=</span><span class="mi">6379</span><span class="p">,</span> <span class="n">decode_responses</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span></code></pre></figure>
<p>We create an <code class="language-plaintext highlighter-rouge">HNSW</code> index called <code class="language-plaintext highlighter-rouge">vectors</code> of documents with a given <code class="language-plaintext highlighter-rouge">doc:</code> prefix. This is unlike other databases where you write docs into an index.</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">index_name</span> <span class="o">=</span> <span class="s">"vectors"</span>
<span class="n">doc_prefix</span> <span class="o">=</span> <span class="s">"doc:"</span>
<span class="n">schema</span> <span class="o">=</span> <span class="p">(</span>
<span class="n">TagField</span><span class="p">(</span><span class="s">"genre"</span><span class="p">),</span>
<span class="n">VectorField</span><span class="p">(</span><span class="s">"values"</span><span class="p">,</span>
<span class="s">"HNSW"</span><span class="p">,</span> <span class="p">{</span>
<span class="s">"TYPE"</span><span class="p">:</span> <span class="s">"FLOAT32"</span><span class="p">,</span>
<span class="s">"DIM"</span><span class="p">:</span> <span class="mi">3</span><span class="p">,</span>
<span class="s">"DISTANCE_METRIC"</span><span class="p">:</span> <span class="s">"COSINE"</span>
<span class="p">}</span>
<span class="p">)</span>
<span class="p">)</span>
<span class="n">definition</span> <span class="o">=</span> <span class="n">IndexDefinition</span><span class="p">(</span>
<span class="n">prefix</span><span class="o">=</span><span class="p">[</span><span class="n">doc_prefix</span><span class="p">],</span>
<span class="n">index_type</span><span class="o">=</span><span class="n">IndexType</span><span class="p">.</span><span class="n">HASH</span>
<span class="p">)</span>
<span class="n">r</span><span class="p">.</span><span class="n">ft</span><span class="p">(</span><span class="n">index_name</span><span class="p">).</span><span class="n">create_index</span><span class="p">(</span><span class="n">fields</span><span class="o">=</span><span class="n">schema</span><span class="p">,</span> <span class="n">definition</span><span class="o">=</span><span class="n">definition</span><span class="p">)</span></code></pre></figure>
<p>Insert some vectors. Note that redis doesn’t support a deep dictionary for metadata, so we will index and filter by <code class="language-plaintext highlighter-rouge">genre</code> in search.</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">pipe</span> <span class="o">=</span> <span class="n">r</span><span class="p">.</span><span class="n">ft</span><span class="p">(</span><span class="n">index_name</span><span class="p">).</span><span class="n">pipeline</span><span class="p">()</span>
<span class="n">vectors</span> <span class="o">=</span> <span class="p">[</span>
<span class="p">{</span>
<span class="s">"id"</span><span class="p">:</span> <span class="mi">1</span><span class="p">,</span>
<span class="s">"values"</span><span class="p">:</span> <span class="p">[</span><span class="mf">0.1</span><span class="p">,</span> <span class="mf">0.2</span><span class="p">,</span> <span class="mf">0.3</span><span class="p">],</span>
<span class="s">"metadata"</span><span class="p">:</span> <span class="p">{</span><span class="s">"genre"</span><span class="p">:</span> <span class="s">"drama"</span><span class="p">},</span>
<span class="p">},</span>
<span class="p">{</span>
<span class="s">"id"</span><span class="p">:</span> <span class="mi">2</span><span class="p">,</span>
<span class="s">"values"</span><span class="p">:</span> <span class="p">[</span><span class="mf">0.2</span><span class="p">,</span> <span class="mf">0.3</span><span class="p">,</span> <span class="mf">0.4</span><span class="p">],</span>
<span class="s">"metadata"</span><span class="p">:</span> <span class="p">{</span><span class="s">"genre"</span><span class="p">:</span> <span class="s">"action"</span><span class="p">},</span>
<span class="p">},</span>
<span class="p">]</span>
<span class="k">for</span> <span class="n">vector</span> <span class="ow">in</span> <span class="n">vectors</span><span class="p">:</span>
<span class="n">key</span> <span class="o">=</span> <span class="sa">f</span><span class="s">"</span><span class="si">{</span><span class="n">doc_prefix</span><span class="si">}{</span><span class="n">vector</span><span class="p">[</span><span class="s">'id'</span><span class="p">]</span><span class="si">}</span><span class="s">"</span>
<span class="n">value</span> <span class="o">=</span> <span class="p">{</span>
<span class="s">"genre"</span><span class="p">:</span> <span class="n">vector</span><span class="p">[</span><span class="s">"metadata"</span><span class="p">][</span><span class="s">"genre"</span><span class="p">],</span>
<span class="s">"values"</span><span class="p">:</span> <span class="n">np</span><span class="p">.</span><span class="n">array</span><span class="p">(</span><span class="n">vector</span><span class="p">[</span><span class="s">"values"</span><span class="p">]).</span><span class="n">astype</span><span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="n">float32</span><span class="p">).</span><span class="n">tobytes</span><span class="p">()</span>
<span class="p">}</span>
<span class="n">pipe</span><span class="p">.</span><span class="n">hset</span><span class="p">(</span><span class="n">key</span><span class="p">,</span> <span class="n">mapping</span><span class="o">=</span><span class="n">value</span><span class="p">)</span>
<span class="n">pipe</span><span class="p">.</span><span class="n">execute</span><span class="p">()</span></code></pre></figure>
<p>Search. We filter by <code class="language-plaintext highlighter-rouge">genre</code> with <code class="language-plaintext highlighter-rouge">@genre:{ action })</code>. Use <code class="language-plaintext highlighter-rouge">**</code> instead if you don’t want filtering.</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">query</span> <span class="o">=</span> <span class="p">(</span>
<span class="n">Query</span><span class="p">(</span><span class="s">"(@genre:{ action })=>[KNN 2 @values $vector as score]"</span><span class="p">)</span>
<span class="p">.</span><span class="n">sort_by</span><span class="p">(</span><span class="s">"score"</span><span class="p">)</span>
<span class="p">.</span><span class="n">return_fields</span><span class="p">(</span><span class="s">"id"</span><span class="p">,</span> <span class="s">"score"</span><span class="p">,</span> <span class="s">"genre"</span><span class="p">)</span>
<span class="p">.</span><span class="n">dialect</span><span class="p">(</span><span class="mi">2</span><span class="p">)</span>
<span class="p">)</span>
<span class="n">query_params</span> <span class="o">=</span> <span class="p">{</span>
<span class="s">"vector"</span><span class="p">:</span> <span class="n">np</span><span class="p">.</span><span class="n">array</span><span class="p">([</span><span class="mf">0.1</span><span class="p">,</span> <span class="mf">0.2</span><span class="p">,</span> <span class="mf">0.3</span><span class="p">]).</span><span class="n">astype</span><span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="n">float32</span><span class="p">).</span><span class="n">tobytes</span><span class="p">()</span>
<span class="p">}</span>
<span class="n">results</span> <span class="o">=</span> <span class="n">r</span><span class="p">.</span><span class="n">ft</span><span class="p">(</span><span class="n">index_name</span><span class="p">).</span><span class="n">search</span><span class="p">(</span><span class="n">query</span><span class="p">,</span> <span class="n">query_params</span><span class="p">).</span><span class="n">docs</span>
<span class="k">for</span> <span class="n">result</span> <span class="ow">in</span> <span class="n">results</span><span class="p">:</span>
<span class="k">print</span><span class="p">(</span><span class="n">result</span><span class="p">)</span></code></pre></figure>
<p>Finally, delete the index with its vectors.</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">r</span><span class="p">.</span><span class="n">ft</span><span class="p">(</span><span class="n">index_name</span><span class="p">).</span><span class="n">dropindex</span><span class="p">(</span><span class="bp">True</span><span class="p">)</span></code></pre></figure>
<p>You can see and run a <a href="https://github.com/dblock/vectordb-hello-world/blob/main/src/redis/hello.py">working sample from here</a>.</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash">poetry run ./hello.py
Document <span class="o">{</span><span class="s1">'id'</span>: <span class="s1">'doc:2'</span>, <span class="s1">'payload'</span>: None, <span class="s1">'score'</span>: <span class="s1">'0.00741678476334'</span>, <span class="s1">'genre'</span>: <span class="s1">'action'</span><span class="o">}</span></code></pre></figure>
<h3 id="vespa">Vespa</h3>
<p><a href="https://vespa.ai/">Vespa</a> is a fully featured search engine and vector database. It supports approximate nearest neighbor search, lexical search, and search in structured data, all in the same query. Vespa is Apache 2.0 licensed, and can be run in a variety of ways, including Docker and as a managed <a href="https://cloud.vespa.ai/">cloud service</a>.</p>
<p>Let’s use their Docker container for this example. Make sure you <a href="https://docs.docker.com/desktop/settings/mac/#resources">configure Docker with at least 4GB RAM</a> (check with <code class="language-plaintext highlighter-rouge">docker info | grep "Total Memory"</code>).</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash">docker pull vespaengine/vespa
docker run <span class="nt">--detach</span> <span class="nt">--name</span> vespa <span class="nt">--hostname</span> vespa-container <span class="se">\</span>
<span class="nt">--publish</span> 8080:8080 <span class="nt">--publish</span> 19071:19071 <span class="se">\</span>
vespaengine/vespa</code></pre></figure>
<p>This container listens on port <code class="language-plaintext highlighter-rouge">8080</code> for search and ingestion APIs, and on <code class="language-plaintext highlighter-rouge">19071</code> for configuration APIs.</p>
<p>Vespa encapsulates the concept of a schema/index in an application that needs to be defined and deployed, so it is not as straightforward as the previous example.</p>
<p>To create a new application with a sample vector schema we need to create a <code class="language-plaintext highlighter-rouge">settings.xml</code> file with the overall application properties, and a <code class="language-plaintext highlighter-rouge">schema.md</code> file with the definition of our schema. For this example, let’s create the following directory structure.</p>
<figure class="highlight"><pre><code class="language-shell" data-lang="shell">vector-app/
├── schemas/
│ └── vector.sd
└── services.xml</code></pre></figure>
<p><code class="language-plaintext highlighter-rouge">services.xml</code>:</p>
<figure class="highlight"><pre><code class="language-xml" data-lang="xml"><span class="cp"><?xml version="1.0" encoding="utf-8" ?></span>
<span class="nt"><services</span> <span class="na">version=</span><span class="s">"1.0"</span> <span class="na">xmlns:deploy=</span><span class="s">"vespa"</span> <span class="na">xmlns:preprocess=</span><span class="s">"properties"</span><span class="nt">></span>
<span class="nt"><container</span> <span class="na">id=</span><span class="s">"default"</span> <span class="na">version=</span><span class="s">"1.0"</span><span class="nt">></span>
<span class="nt"><document-api/></span>
<span class="nt"><search/></span>
<span class="nt"><nodes></span>
<span class="nt"><node</span> <span class="na">hostalias=</span><span class="s">"node1"</span> <span class="nt">/></span>
<span class="nt"></nodes></span>
<span class="nt"></container></span>
<span class="nt"><content</span> <span class="na">id=</span><span class="s">"vector"</span> <span class="na">version=</span><span class="s">"1.0"</span><span class="nt">></span>
<span class="nt"><redundancy></span>2<span class="nt"></redundancy></span>
<span class="nt"><documents></span>
<span class="nt"><document</span> <span class="na">type=</span><span class="s">"vector"</span> <span class="na">mode=</span><span class="s">"index"</span> <span class="nt">/></span>
<span class="nt"></documents></span>
<span class="nt"><nodes></span>
<span class="nt"><node</span> <span class="na">hostalias=</span><span class="s">"node1"</span> <span class="na">distribution-key=</span><span class="s">"0"</span> <span class="nt">/></span>
<span class="nt"></nodes></span>
<span class="nt"></content></span>
<span class="nt"></services></span></code></pre></figure>
<p><code class="language-plaintext highlighter-rouge">vector.sd</code>:</p>
<figure class="highlight"><pre><code class="language-xml" data-lang="xml">schema vector {
document vector {
field id type string {
indexing: summary | attribute
}
field values type tensor<span class="nt"><float></span>(x[3]) {
indexing: summary | attribute
attribute {
distance-metric: angular
}
}
struct metadatatype {
field genre type string {}
}
field metadata type metadatatype {
indexing: summary
}
}
rank-profile vector_similarity {
inputs {
query(vector_query_embedding) tensor<span class="nt"><float></span>(x[3])
}
first-phase {
expression: closeness(field, values)
}
}</code></pre></figure>
<p>Deploy using the configuration API.</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="o">(</span><span class="nb">cd </span>vector-app <span class="o">&&</span> zip <span class="nt">-r</span> - .<span class="o">)</span> | <span class="se">\</span>
curl <span class="nt">--header</span> Content-Type:application/zip <span class="nt">--data-binary</span> @- <span class="se">\</span>
localhost:19071/application/v2/tenant/default/prepareandactivate
curl <span class="se">\</span>
<span class="nt">--header</span> Content-Type:application/zip <span class="se">\</span>
<span class="nt">-XPOST</span> localhost:19071/application/v2/tenant/default/session</code></pre></figure>
<p>Setup the client.</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">endpoint</span> <span class="o">=</span> <span class="s">"https://localhost:8080"</span>
<span class="n">client</span> <span class="o">=</span> <span class="n">Client</span><span class="p">(</span><span class="n">verify</span><span class="o">=</span><span class="bp">False</span><span class="p">)</span>
<span class="n">headers</span> <span class="o">=</span> <span class="p">{</span>
<span class="s">"Accept"</span><span class="p">:</span> <span class="s">"application/json; charset=utf-8"</span><span class="p">,</span>
<span class="s">"Content-Type"</span><span class="p">:</span> <span class="s">"application/json; charset=utf-8"</span><span class="p">,</span>
<span class="p">}</span></code></pre></figure>
<p>Ingest some vectors.</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">vectors</span> <span class="o">=</span> <span class="p">[</span>
<span class="p">{</span>
<span class="s">"id"</span><span class="p">:</span> <span class="s">"vec1"</span><span class="p">,</span>
<span class="s">"values"</span><span class="p">:</span> <span class="p">[</span><span class="mf">0.1</span><span class="p">,</span> <span class="mf">0.2</span><span class="p">,</span> <span class="mf">0.3</span><span class="p">],</span>
<span class="s">"metadata"</span><span class="p">:</span> <span class="p">{</span><span class="s">"genre"</span><span class="p">:</span> <span class="s">"drama"</span><span class="p">},</span>
<span class="p">},</span>
<span class="p">{</span>
<span class="s">"id"</span><span class="p">:</span> <span class="s">"vec2"</span><span class="p">,</span>
<span class="s">"values"</span><span class="p">:</span> <span class="p">[</span><span class="mf">0.2</span><span class="p">,</span> <span class="mf">0.3</span><span class="p">,</span> <span class="mf">0.4</span><span class="p">],</span>
<span class="s">"metadata"</span><span class="p">:</span> <span class="p">{</span><span class="s">"genre"</span><span class="p">:</span> <span class="s">"comedy"</span><span class="p">},</span>
<span class="p">},</span>
<span class="p">]</span>
<span class="k">for</span> <span class="n">vector</span> <span class="ow">in</span> <span class="n">vectors</span><span class="p">:</span>
<span class="n">data</span> <span class="o">=</span> <span class="n">json</span><span class="p">.</span><span class="n">dumps</span><span class="p">({</span><span class="s">"fields"</span><span class="p">:</span> <span class="n">vector</span><span class="p">})</span>
<span class="n">client</span><span class="p">.</span><span class="n">post</span><span class="p">(</span>
<span class="n">urljoin</span><span class="p">(</span><span class="n">endpoint</span><span class="p">,</span> <span class="s">"/document/v1/vector/vector/docid/"</span> <span class="o">+</span> <span class="n">vector</span><span class="p">[</span><span class="s">"id"</span><span class="p">]),</span>
<span class="n">headers</span><span class="o">=</span><span class="n">headers</span><span class="p">,</span>
<span class="n">data</span><span class="o">=</span><span class="n">data</span>
<span class="p">)</span></code></pre></figure>
<p>Search.</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">query</span> <span class="o">=</span> <span class="s">"yql=select * from sources * where {targetHits: 1} nearestNeighbor(values,vector_query_embedding)"</span> \
<span class="s">"&ranking.profile=vector_similarity"</span> \
<span class="s">"&hits=1"</span> \
<span class="s">"&input.query(vector_query_embedding)=[0.1,0.2,0.3]"</span>
<span class="n">results</span> <span class="o">=</span> <span class="n">client</span><span class="p">.</span><span class="n">get</span><span class="p">(</span>
<span class="n">urljoin</span><span class="p">(</span><span class="n">endpoint</span><span class="p">,</span> <span class="s">"/search/"</span><span class="p">),</span>
<span class="n">headers</span><span class="o">=</span><span class="n">headers</span><span class="p">,</span>
<span class="n">params</span><span class="o">=</span><span class="n">query</span>
<span class="p">).</span><span class="n">json</span><span class="p">()</span>
<span class="k">print</span><span class="p">(</span><span class="n">results</span><span class="p">[</span><span class="s">"root"</span><span class="p">][</span><span class="s">"children"</span><span class="p">][</span><span class="mi">0</span><span class="p">][</span><span class="s">"fields"</span><span class="p">])</span></code></pre></figure>
<p>You can see and run a <a href="https://github.com/dblock/vectordb-hello-world/blob/main/src/vespa/hello.py">working sample from here</a>.</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nv">ENDPOINT</span><span class="o">=</span>https://localhost:8080 <span class="nv">CONFIG_ENDPOINT</span><span class="o">=</span>https://localhost:19071 poetry run src/vespa/hello.py
<span class="o">></span> POST https://localhost:8080/document/v1/vector/vector/docid/vec1
< POST https://localhost:8080/document/v1/vector/vector/docid/vec1 - 200
<span class="o">></span> POST https://localhost:8080/document/v1/vector/vector/docid/vec2
< POST https://localhost:8080/document/v1/vector/vector/docid/vec2 - 200
<span class="o">></span> GET https://localhost:8080/search/?yql<span class="o">=</span><span class="k">select</span>%20%2A%20from%20sources%20%2A%20where%20%7BtargetHits%3A%201%7DnearestNeighbor%28values%2Cvector_query_embedding%29&ranking.profile<span class="o">=</span>vector_similarity&hits<span class="o">=</span>1&input.query%28vector_query_embedding%29<span class="o">=</span>%5B0.1%2C0.2%2C0.3%5D
< GET https://localhost:8080/search/?yql<span class="o">=</span><span class="k">select</span>%20%2A%20from%20sources%20%2A%20where%20%7BtargetHits%3A%201%7DnearestNeighbor%28values%2Cvector_query_embedding%29&ranking.profile<span class="o">=</span>vector_similarity&hits<span class="o">=</span>1&input.query%28vector_query_embedding%29<span class="o">=</span>%5B0.1%2C0.2%2C0.3%5D - 200
<span class="o">{</span><span class="s1">'sddocname'</span>: <span class="s1">'vector'</span>, <span class="s1">'documentid'</span>: <span class="s1">'id:vector:vector::vec1'</span>, <span class="s1">'id'</span>: <span class="s1">'vec1'</span>, <span class="s1">'values'</span>: <span class="o">{</span><span class="s1">'type'</span>: <span class="s1">'tensor<float>(x[3])'</span>, <span class="s1">'values'</span>: <span class="o">[</span>0.10000000149011612, 0.20000000298023224, 0.30000001192092896]<span class="o">}</span>, <span class="s1">'metadata'</span>: <span class="o">{</span><span class="s1">'genre'</span>: <span class="s1">'drama'</span><span class="o">}}</span>
<span class="o">></span> DELETE https://localhost:19071/application/v2/tenant/default/application/default
< DELETE https://localhost:19071/application/v2/tenant/default/application/default - 200</code></pre></figure>
<h3 id="weaviate">Weaviate</h3>
<p><a href="https://weaviate.io">Weaviate</a> is a vector search engine specifically designed for natural language numerical data. It uses contextualized embeddings in data objects to understand semantic similarity. Currently, it supports only Hierarchical Navigable Small World (HNSW) indexing, and is more costly on building data to indexes. However, it has a fast query time and high scalability. Weaviate is open-source, easy to use, flexible, extensible, and has a Contributor License Agreement.</p>
<p>After you sign up at Weaviate Cloud Services WCS, create a new free tier Weaviate Cluster with authentication. Note your cluster URL and API key (optional). The endpoint will have the following format https://myindex.weaviate.network.</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">client</span> <span class="o">=</span> <span class="n">Client</span><span class="p">()</span>
<span class="n">endpoint</span> <span class="o">=</span> <span class="n">os</span><span class="p">.</span><span class="n">environ</span><span class="p">[</span><span class="s">"ENDPOINT"</span><span class="p">]</span>
<span class="n">api_key</span> <span class="o">=</span> <span class="n">os</span><span class="p">.</span><span class="n">getenv</span><span class="p">(</span><span class="s">"API_KEY"</span><span class="p">)</span> <span class="c1"># optional
</span>
<span class="n">headers</span> <span class="o">=</span> <span class="p">{</span>
<span class="s">"Accept"</span><span class="p">:</span> <span class="s">"application/json; charset=utf-8"</span><span class="p">,</span>
<span class="s">"Content-Type"</span><span class="p">:</span> <span class="s">"application/json; charset=utf-8"</span>
<span class="p">}</span>
<span class="k">if</span> <span class="ow">not</span> <span class="n">api_key</span> <span class="ow">is</span> <span class="bp">None</span><span class="p">:</span>
<span class="n">headers</span><span class="p">[</span><span class="s">"Authorization"</span><span class="p">]</span> <span class="o">=</span> <span class="sa">f</span><span class="s">"Bearer </span><span class="si">{</span><span class="n">api_key</span><span class="si">}</span><span class="s">"</span></code></pre></figure>
<p>It is easy to create some objects with vectors.</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">vectors</span> <span class="o">=</span> <span class="p">[</span>
<span class="p">{</span>
<span class="s">"id"</span><span class="p">:</span> <span class="s">"vec1"</span><span class="p">,</span>
<span class="s">"values"</span><span class="p">:</span> <span class="p">[</span><span class="mf">0.1</span><span class="p">,</span> <span class="mf">0.2</span><span class="p">,</span> <span class="mf">0.3</span><span class="p">],</span>
<span class="s">"properties"</span><span class="p">:</span> <span class="p">{</span>
<span class="s">"genre"</span><span class="p">:</span> <span class="s">"drama"</span>
<span class="p">}</span>
<span class="p">},</span>
<span class="p">{</span>
<span class="s">"id"</span><span class="p">:</span> <span class="s">"vec2"</span><span class="p">,</span>
<span class="s">"values"</span><span class="p">:</span> <span class="p">[</span><span class="mf">0.2</span><span class="p">,</span> <span class="mf">0.3</span><span class="p">,</span> <span class="mf">0.4</span><span class="p">],</span>
<span class="s">"properties"</span><span class="p">:</span> <span class="p">{</span>
<span class="s">"genre"</span><span class="p">:</span> <span class="s">"action"</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="p">]</span>
<span class="n">objects</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">for</span> <span class="n">vector</span> <span class="ow">in</span> <span class="n">vectors</span><span class="p">:</span>
<span class="n">obj</span> <span class="o">=</span> <span class="p">{</span>
<span class="s">"class"</span><span class="p">:</span> <span class="s">"Vectors"</span><span class="p">,</span>
<span class="s">"properties"</span><span class="p">:</span> <span class="p">{</span>
<span class="s">"vector"</span><span class="p">:</span> <span class="n">vector</span><span class="p">[</span><span class="s">"values"</span><span class="p">]</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="n">objects</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">obj</span><span class="p">)</span>
<span class="n">client</span><span class="p">.</span><span class="n">post</span><span class="p">(</span>
<span class="n">urljoin</span><span class="p">(</span><span class="n">endpoint</span><span class="p">,</span> <span class="s">"/v1/batch/objects"</span><span class="p">),</span>
<span class="n">json</span><span class="o">=</span><span class="p">{</span><span class="s">"objects"</span><span class="p">:</span> <span class="n">objects</span><span class="p">},</span>
<span class="n">headers</span><span class="o">=</span><span class="n">headers</span>
<span class="p">)</span></code></pre></figure>
<p>The search is pretty straightforward. Weaviate also has a GraphQL interface.</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">query</span> <span class="o">=</span> <span class="p">{</span>
<span class="s">"fields"</span><span class="p">:</span> <span class="s">"vector"</span><span class="p">,</span>
<span class="s">"nearVector"</span><span class="p">:</span> <span class="p">{</span>
<span class="s">"vector"</span><span class="p">:</span> <span class="p">[</span><span class="mf">0.1</span><span class="p">],</span>
<span class="s">"certainty"</span><span class="p">:</span> <span class="mf">0.9</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="n">response</span> <span class="o">=</span> <span class="n">client</span><span class="p">.</span><span class="n">get</span><span class="p">(</span>
<span class="n">urljoin</span><span class="p">(</span><span class="n">endpoint</span><span class="p">,</span> <span class="s">"/v1/objects"</span><span class="p">),</span>
<span class="n">params</span><span class="o">=</span><span class="n">query</span><span class="p">,</span>
<span class="n">headers</span><span class="o">=</span><span class="n">headers</span>
<span class="p">).</span><span class="n">json</span><span class="p">()</span>
<span class="k">for</span> <span class="n">obj</span> <span class="ow">in</span> <span class="n">response</span><span class="p">[</span><span class="s">"objects"</span><span class="p">]:</span>
<span class="k">print</span><span class="p">(</span><span class="n">obj</span><span class="p">)</span></code></pre></figure>
<p>Deleting objects of the same class is straightforward.</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">client</span><span class="p">.</span><span class="n">delete</span><span class="p">(</span>
<span class="n">urljoin</span><span class="p">(</span><span class="n">endpoint</span><span class="p">,</span> <span class="sa">f</span><span class="s">"/v1/schema/Vectors"</span><span class="p">),</span>
<span class="n">headers</span><span class="o">=</span><span class="n">headers</span>
<span class="p">)</span></code></pre></figure>
<p>You can see and run a <a href="https://github.com/dblock/vectordb-hello-world/blob/main/src/weaviate/hello.py">working sample from here</a>.</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nv">API_KEY</span><span class="o">=</span>... <span class="nv">ENDPOINT</span><span class="o">=</span>https://my-cluster.weaviate.network poetry run src/weaviate/hello.py
<span class="o">></span> POST https://myindex.weaviate.network/v1/batch/objects
< POST https://myindex.weaviate.network/v1/batch/objects - 200
<span class="o">></span> GET https://myindex.weaviate.network/v1/objects?fields<span class="o">=</span>vector&nearVector<span class="o">=</span>%7B%27vector%27%3A%20%5B0.1%5D%2C%20%27certainty%27%3A%200.9%7D
< GET https://myindex.weaviate.network/v1/objects?fields<span class="o">=</span>vector&nearVector<span class="o">=</span>%7B%27vector%27%3A%20%5B0.1%5D%2C%20%27certainty%27%3A%200.9%7D - 200
<span class="o">{</span><span class="s1">'class'</span>: <span class="s1">'Vectors'</span>, <span class="s1">'creationTimeUnix'</span>: 1688914857307, <span class="s1">'id'</span>: <span class="s1">'46e40d05-d550-4415-aa2c-7c004fcdd037'</span>, <span class="s1">'lastUpdateTimeUnix'</span>: 1688914857307, <span class="s1">'properties'</span>: <span class="o">{</span><span class="s1">'vector'</span>: <span class="o">[</span>0.1, 0.2, 0.3]<span class="o">}</span>, <span class="s1">'vectorWeights'</span>: None<span class="o">}</span>
<span class="o">{</span><span class="s1">'class'</span>: <span class="s1">'Vectors'</span>, <span class="s1">'creationTimeUnix'</span>: 1688914857307, <span class="s1">'id'</span>: <span class="s1">'c14bd5b1-8b81-44a4-8051-3b9b8c52cde4'</span>, <span class="s1">'lastUpdateTimeUnix'</span>: 1688914857307, <span class="s1">'properties'</span>: <span class="o">{</span><span class="s1">'vector'</span>: <span class="o">[</span>0.2, 0.3, 0.4]<span class="o">}</span>, <span class="s1">'vectorWeights'</span>: None<span class="o">}</span>
<span class="o">></span> DELETE https://myindex.weaviate.network/v1/schema/Vectors
< DELETE https://myindex.weaviate.network/v1/schema/Vectors - 200</code></pre></figure>
<h3 id="others">Others</h3>
<p>This blog post and <a href="https://github.com/dblock/vectordb-hello-world/">its code</a> could use your help for more examples for <a href="https://github.com/milvus-io/milvus">Milvus</a>, <a href="https://github.com/vector-ai/vectorai">Vector.ai</a>, <a href="https://github.com/nuclia/nucliadb">NucliaDB</a>, <a href="https://vald.vdaas.org/">Vald</a>, etc.</p>
<p>I also wonder whether we need a generic client that’s agnostic to which vector DB is being used to help make code portable? I <a href="https://github.com/dblock/vectordb-client">took a stab at a very simple prototype</a>.</p>
<p><a href="https://code.dblock.org/2023/06/16/getting-started-with-vector-dbs-in-python.html">Getting started with Vector DBs in Python</a> was originally published by Daniel Doubrovkine at <a href="https://code.dblock.org">code.dblock.org | tech blog</a> on June 16, 2023.</p>https://code.dblock.org/2023/04/29/triggering-ci-from-pull-requests-and-force-pushes-in-github-actions2023-04-29T00:00:00+00:002023-04-29T00:00:00+00:00Daniel Doubrovkinehttps://code.dblock.orgdblock@dblock.org<p>The <a href="https://github.com/slack-ruby/slack-ruby-client/">slack-ruby-client</a> generates code from an <a href="https://github.com/slack-ruby/slack-api-ref">API reference</a> scraped from the Slack documentation website. Until now, the update process was a manual operation involving checking out the code, running a <code class="language-plaintext highlighter-rouge">rake</code> task, updating a <code class="language-plaintext highlighter-rouge">CHANGELOG.md</code>, and making a pull request, e.g. <a href="https://github.com/slack-ruby/slack-ruby-client/pull/455">slack-ruby-client#455</a>.</p>
<p>Let’s automate this using GitHub Actions (GHA)! We’ll need some advanced token-fu to auto-trigger CI.</p>
<p>A basic job that runs on cron, daily at 11:15PM.</p>
<figure class="highlight"><pre><code class="language-yaml" data-lang="yaml"><span class="na">name</span><span class="pi">:</span> <span class="s">Update API</span>
<span class="na">on</span><span class="pi">:</span>
<span class="na">workflow_dispatch</span><span class="pi">:</span>
<span class="na">schedule</span><span class="pi">:</span>
<span class="pi">-</span> <span class="na">cron</span><span class="pi">:</span> <span class="s2">"</span><span class="s">15</span><span class="nv"> </span><span class="s">23</span><span class="nv"> </span><span class="s">*</span><span class="nv"> </span><span class="s">*</span><span class="nv"> </span><span class="s">*"</span>
<span class="na">jobs</span><span class="pi">:</span>
<span class="na">update-api</span><span class="pi">:</span>
<span class="na">runs-on</span><span class="pi">:</span> <span class="s">ubuntu-latest</span></code></pre></figure>
<p>Scope permissions to r/w access to repo contents and pull requests.</p>
<figure class="highlight"><pre><code class="language-yaml" data-lang="yaml"><span class="na">permissions</span><span class="pi">:</span>
<span class="na">contents</span><span class="pi">:</span> <span class="s">write</span>
<span class="na">pull-requests</span><span class="pi">:</span> <span class="s">write</span></code></pre></figure>
<p>Check-out the code, and run the <code class="language-plaintext highlighter-rouge">rake</code> task that updates the API.</p>
<figure class="highlight"><pre><code class="language-yaml" data-lang="yaml"><span class="na">steps</span><span class="pi">:</span>
<span class="pi">-</span> <span class="na">uses</span><span class="pi">:</span> <span class="s">actions/checkout@v3</span>
<span class="na">with</span><span class="pi">:</span>
<span class="na">submodules</span><span class="pi">:</span> <span class="s">recursive</span>
<span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">Set up Ruby</span>
<span class="na">uses</span><span class="pi">:</span> <span class="s">ruby/setup-ruby@v1</span>
<span class="na">with</span><span class="pi">:</span>
<span class="na">ruby-version</span><span class="pi">:</span> <span class="s2">"</span><span class="s">3.2"</span>
<span class="na">bundler-cache</span><span class="pi">:</span> <span class="no">true</span>
<span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">Update API from slack-api-ref</span>
<span class="na">run</span><span class="pi">:</span> <span class="s">bundle exec rake slack:api:update</span></code></pre></figure>
<p>Create a pull request with the changes.</p>
<figure class="highlight"><pre><code class="language-yaml" data-lang="yaml"><span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">Create pull request</span>
<span class="na">id</span><span class="pi">:</span> <span class="s">cpr</span>
<span class="na">uses</span><span class="pi">:</span> <span class="s">peter-evans/create-pull-request@v4</span>
<span class="na">with</span><span class="pi">:</span>
<span class="na">token</span><span class="pi">:</span> <span class="s">${{ secrets.GITHUB_TOKEN }}</span>
<span class="na">commit-message</span><span class="pi">:</span> <span class="s">Update API from slack-api-ref</span>
<span class="na">title</span><span class="pi">:</span> <span class="s">Update API from slack-api-ref</span>
<span class="na">body</span><span class="pi">:</span> <span class="pi">|</span>
<span class="s">Update API from slack-api-ref.</span>
<span class="na">branch</span><span class="pi">:</span> <span class="s">automated-api-update</span>
<span class="na">base</span><span class="pi">:</span> <span class="s">master</span></code></pre></figure>
<p>This works, but does not trigger CI. This is by design, because <code class="language-plaintext highlighter-rouge">GITHUB_TOKEN</code> is <a href="https://github.com/peter-evans/create-pull-request/issues/48">not allowed to</a>.</p>
<p>To trigger CI we need a different token. You can create a personal access token (PAT), but that would run CI under your account, which may exclude you from approving PRs because of branch protection rules. A better solution is to use a token from <a href="https://docs.github.com/en/apps/creating-github-apps">an org-owned GitHub app</a>. I created one called “Slack Ruby CI Bot”, and gave it r/w permissions for “Contents” and “Pull Requests”, then installed it in the <a href="https://github.com/slack-ruby">slack-ruby GitHub org</a> and noted the installation ID. I also generated a new private key from the bottom of the <a href="https://github.com/organizations/slack-ruby/settings/apps/slack-ruby-ci-bot">app settings page</a> and set two repo secrets: <code class="language-plaintext highlighter-rouge">CI_APP_ID</code> to the value of the app ID, and <code class="language-plaintext highlighter-rouge">CI_APP_PRIVATE_KEY</code> for the contents of the private key from the <code class="language-plaintext highlighter-rouge">.pem</code> file downloaded from GitHub.</p>
<p>Get the the app token in GHA.</p>
<figure class="highlight"><pre><code class="language-yaml" data-lang="yaml"><span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">GitHub App token</span>
<span class="na">id</span><span class="pi">:</span> <span class="s">github_app_token</span>
<span class="na">uses</span><span class="pi">:</span> <span class="s">tibdex/github-app-token@v1.6.0</span>
<span class="na">with</span><span class="pi">:</span>
<span class="na">app_id</span><span class="pi">:</span> <span class="s">${{ secrets.CI_APP_ID }}</span>
<span class="na">private_key</span><span class="pi">:</span> <span class="s">${{ secrets.CI_APP_PRIVATE_KEY }}</span>
<span class="na">installation_id</span><span class="pi">:</span> <span class="s">36985419</span></code></pre></figure>
<p>Use it in the pull request GHA, with a fallback to <code class="language-plaintext highlighter-rouge">GITHUB_TOKEN</code> for testing the GHA in my fork.</p>
<figure class="highlight"><pre><code class="language-yaml" data-lang="yaml"><span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">Create pull request</span>
<span class="na">id</span><span class="pi">:</span> <span class="s">cpr</span>
<span class="na">uses</span><span class="pi">:</span> <span class="s">peter-evans/create-pull-request@v4</span>
<span class="na">with</span><span class="pi">:</span>
<span class="na">token</span><span class="pi">:</span> <span class="s">${{ steps.github_app_token.outputs.token || secrets.GITHUB_TOKEN }}</span></code></pre></figure>
<p>Now that PRs trigger CI, and commits are made by <code class="language-plaintext highlighter-rouge">slack-ruby-ci-bot</code>, let’s update <code class="language-plaintext highlighter-rouge">CHANGELOG.md</code> with the PR number output by the <code class="language-plaintext highlighter-rouge">create-pull-request</code> action. A text search-and-replace will do.</p>
<figure class="highlight"><pre><code class="language-yaml" data-lang="yaml"><span class="pi">-</span> <span class="na">uses</span><span class="pi">:</span> <span class="s">jacobtomlinson/gha-find-replace@v3</span>
<span class="na">if</span><span class="pi">:</span> <span class="s">${{ steps.cpr.outputs.pull-request-number != '' }}</span>
<span class="na">with</span><span class="pi">:</span>
<span class="na">include</span><span class="pi">:</span> <span class="s">CHANGELOG.md</span>
<span class="na">find</span><span class="pi">:</span> <span class="s2">"</span><span class="se">\\</span><span class="s">*</span><span class="nv"> </span><span class="s">Your</span><span class="nv"> </span><span class="s">contribution</span><span class="nv"> </span><span class="s">here."</span>
<span class="na">replace</span><span class="pi">:</span> <span class="s2">"</span><span class="s">*</span><span class="nv"> </span><span class="s">[#${{steps.cpr.outputs.pull-request-number}}]</span><span class="nv"> </span><span class="s">...</span><span class="se">\n</span><span class="s">*</span><span class="nv"> </span><span class="s">Your</span><span class="nv"> </span><span class="s">contribution</span><span class="nv"> </span><span class="s">here."</span></code></pre></figure>
<p>We can amend the previous pull request and force-push the change back to GitHub. To authenticate to GitHub using the above-mentioned token we generate a base64-encoded BASIC auth <code class="language-plaintext highlighter-rouge">x-access-token:token</code> header, then stuff it into all HTTP requests made by <code class="language-plaintext highlighter-rouge">git</code>. This is what the <code class="language-plaintext highlighter-rouge">create-pull-request</code> action actually does in code, too.</p>
<figure class="highlight"><pre><code class="language-yaml" data-lang="yaml"><span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">Commit and Push</span>
<span class="na">run</span><span class="pi">:</span> <span class="pi">|</span>
<span class="s">git config --local user.name 'slack-ruby-ci-bot'</span>
<span class="s">git config --local user.email 'noreply@github.com'</span>
<span class="s">git config --local --unset-all http.https://github.com/.extraheader || true</span>
<span class="s">AUTH=$(echo -n "x-access-token:${{ steps.github_app_token.outputs.token || secrets.GITHUB_TOKEN }}" | base64)</span>
<span class="s">echo "::add-mask::${AUTH}"</span>
<span class="s">git config --local http.https://github.com/.extraheader "AUTHORIZATION: basic ${AUTH}"</span>
<span class="s">git add CHANGELOG.md</span>
<span class="s">git commit --amend --no-edit</span>
<span class="s">git push origin automated-api-update -f</span></code></pre></figure>
<p>Bonus features include getting the current date and the git commit of the updated submodule that contains the API reference to make the CHANGELOG and the commit messages pretty.</p>
<figure class="highlight"><pre><code class="language-yaml" data-lang="yaml"><span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">Get current date</span>
<span class="na">id</span><span class="pi">:</span> <span class="s">date</span>
<span class="na">run</span><span class="pi">:</span> <span class="s">echo "::set-output name=date::$(date +'%Y-%m-%d')"</span></code></pre></figure>
<figure class="highlight"><pre><code class="language-yaml" data-lang="yaml"><span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">Get slack-api-ref ref</span>
<span class="na">id</span><span class="pi">:</span> <span class="s">api-ref</span>
<span class="na">run</span><span class="pi">:</span> <span class="s">echo "::set-output name=api-ref::$(git rev-parse --short HEAD:lib/slack/web/api/slack-api-ref)"</span></code></pre></figure>
<p>The final result is <a href="https://github.com/slack-ruby/slack-ruby-client/blob/master/.github/workflows/update_api.yml">here</a> and you can see it in action in <a href="https://github.com/slack-ruby/slack-ruby-client/pull/465">slack-ruby-client#465</a>.</p>
<p><a href="https://code.dblock.org/2023/04/29/triggering-ci-from-pull-requests-and-force-pushes-in-github-actions.html">Triggering CI from Pull Requests and Force Pushes in GitHub Actions</a> was originally published by Daniel Doubrovkine at <a href="https://code.dblock.org">code.dblock.org | tech blog</a> on April 29, 2023.</p>https://code.dblock.org/2023/01/29/backing-up-digital-ocean-mongodb-to-dropbox2023-01-29T00:00:00+00:002023-01-29T00:00:00+00:00Daniel Doubrovkinehttps://code.dblock.orgdblock@dblock.org<p>After <a href="/2023/01/15/migrating-from-dokku-to-digital-ocean-apps.html">migrating my apps to DigitalOcean apps</a> I started looking for a MongoDB automated offsite backup solution. DO backs up all MongoDB daily automatically, but I am paranoid, and like to store an offsite copy of the data in Dropbox in monthly increments.</p>
<p>I first <a href="https://github.com/dblock/do-mongodb-backup">tried to build a DigitalOcean app function</a> that could run on a schedule and connect to my database, but ran into two missing features: <a href="https://ideas.digitalocean.com/app-framework-services/p/non-web-app-functions-that-cannot-be-invoked-externally-without-auth">lack of non-web app functions</a>, and <a href="https://ideas.digitalocean.com/app-framework-services/p/add-functions-to-trusted-sources">adding functions to trusted sources</a>. In short you can either make an app with a function that connects to a database, but then it’s always a web function with no cron support, or you can make a function that can be invoked on a cron, but cannot connect to your database.</p>
<p>I tried <a href="https://simplebackups.com/?via=dblock">simplebackups</a>, and found the UX somewhat to be desired and that it was too expensive for the service it provided. In theory, it could connect to DO in a single click, and set everything up, but in practice the UX didn’t always work, I had to manually allow-list a bunch of IPs, saw cryptic error messages in failing backups, etc. A serverless simple backup with your own storage costs $29/mo, which is too steep for my needs of 1 single database backup that is stored offsite. I’d just be paying for a daily cron, worth no more than $5 to me.</p>
<p>Finally, I settled on a cron and <a href="https://github.com/dblock/dotfiles/blob/master/bash/bin/mongodb-dump">a script</a> to run on my mac. The script has some nice features, such as storing credentials in the keychain that I reuse in a lot of such scripts.</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nv">AUTH</span><span class="o">=</span><span class="si">$(</span>security find-generic-password <span class="nt">-s</span> <span class="nv">$URI</span> <span class="nt">-w</span><span class="si">)</span>
<span class="k">if</span> <span class="o">[</span> <span class="nt">-z</span> <span class="s2">"</span><span class="nv">$AUTH</span><span class="s2">"</span> <span class="o">]</span><span class="p">;</span> <span class="k">then
</span><span class="nb">read</span> <span class="nt">-p</span> <span class="s1">'MongoDB Username: '</span> USERNAME
<span class="nb">read</span> <span class="nt">-sp</span> <span class="s1">'MongoDB Password: '</span> PASSWORD
<span class="nb">printf</span> <span class="s2">"</span><span class="se">\n</span><span class="s2">"</span>
<span class="nv">AUTH</span><span class="o">=</span><span class="nv">$USERNAME</span>:<span class="nv">$PASSWORD</span>
security add-generic-password <span class="nt">-a</span> <span class="nv">$USER</span> <span class="nt">-s</span> <span class="nv">$URI</span> <span class="nt">-w</span> <span class="s2">"</span><span class="nv">$AUTH</span><span class="s2">"</span>
<span class="k">fi</span></code></pre></figure>
<p>Note that to access a DO MongoDB you need your <a href="https://github.com/dblock/dotfiles/blob/master/bash/bin/ip">external IP</a> in trusted sources. It’s annoying to add in case my IP changes, but because I already get automated backups elsewhere, I am OK with these limitations.</p>
<p><a href="https://code.dblock.org/2023/01/29/backing-up-digital-ocean-mongodb-to-dropbox.html">Backing up DigitalOcean MongoDB to Dropbox</a> was originally published by Daniel Doubrovkine at <a href="https://code.dblock.org">code.dblock.org | tech blog</a> on January 29, 2023.</p>https://code.dblock.org/2023/01/15/migrating-from-dokku-to-digital-ocean-apps2023-01-15T00:00:00+00:002023-01-15T00:00:00+00:00Daniel Doubrovkinehttps://code.dblock.orgdblock@dblock.org<p>In 2016 I <a href="/2016/02/08/running-slack-bots-on-digital-ocean-with-dokku.html">moved</a> half a dozen apps from Heroku to a DigitalOcean droplet to save money. I found <a href="https://github.com/dokku/dokku">dokku</a>, a docker-powered PaaS. It was already quite mature, and worked flawlessly. In 2023 I am moving back from the single droplet to apps, but staying on DigitalOcean. It was a good 7-year-long run for my droplet!</p>
<h3 id="what-am-i-moving">What am I moving?</h3>
<p>I’ve got 4 profitable, and 5 money-losing or free Slack apps, all open-source.</p>
<ul>
<li><a href="https://www.playplay.io">www.playplay.io</a>: A ping-pong/chess/pool/tic-tac-toe leaderboard for Slack.</li>
<li><a href="https://slava.playplay.io/">slava.playplay.io</a>: Strava integration in Slack.</li>
<li><a href="https://sup.playplay.io/">sup.playplay.io</a>: Helps team members meet every week in an informal standup.</li>
<li><a href="https://market.playplay.io/">market.playplay.io</a>: Stock market quotes in Slack.</li>
<li><a href="https://moji.playplay.io/">moji.playplay.io</a>: More emoji in Slack.</li>
<li><a href="https://invite.playplay.io/">invite.playplay.io</a>: Help your users join your Slack.</li>
<li><a href="https://arena.playplay.io/">arena.playplay.io</a>: Are.na integration with Slack.</li>
<li><a href="https://shell.playplay.io/">shell.playplay.io</a>: Whoa, a bash shell inside Slack!</li>
<li><a href="https://api-explorer.playplay.io/">api-explorer.playplay.io</a>: A Slack web API explorer.</li>
</ul>
<h3 id="why-move">Why move?</h3>
<p>Over the years I got increasingly nervous about doing any kind of maintenance operations on the Linux droplet. Upgrading Dokku, or its plugins, under half a dozen applications had the potential side effect of taking all my projects down at once. Before doing anything drastic, I would cautiously snapshot my droplet. For major upgrades, I would even power the droplet down before making a snapshot, incurring half an hour of downtime. Then I’d type <code class="language-plaintext highlighter-rouge">sudo apt-get upgrade</code>, fingers crossed. A couple of times these operations would render the host inoperable, so I’d revert and figure out a manual path forward.</p>
<p>In early 2022 the inevitable happened: I <a href="https://github.com/dokku/dokku/issues/5523">got permanently stuck</a> with an old Linux distro that just would not upgrade the ancient 3.13 kernel to 4.x. Slack runs periodic pentests on its marketplace bots, and I was now running on non-LTS versions of Ruby, whereas newer versions would <a href="https://github.com/heroku/heroku-buildpack-ruby/issues/1312">not work on the old kernel</a> (<em>securerandom.rb:75:in ‘urandom’: failed to get urandom (RuntimeError)</em>). I was forced to upgrade, but every attempt to bring my Dokku apps back up on a 4.x kernel failed. Docker refused to start with my existing data.</p>
<p>I finally had to accept that I was just not smart enough to understand what <em>“aufs is not supported anymore”</em> meant, or how I was supposed to <em>“use overlay”</em> without losing all my existing data, despite the fact that <em>“as far as people know, only ephemeral container data is stored in that aufs path”</em>. I was <em>that</em> old to understand how Docker worked. I’ve finally reached the level of my incompetence!</p>
<p>The only workable solution was to provision a new server with a newer Linux distro, and migrate everything to it. Instead, I decided to evaluate other options. Because DigitalOcean had been a reliable and trusted platform for 7 years, I went with <a href="https://m.do.co/c/5b26011f9a9b">DigitalOcean apps</a>.</p>
<h3 id="migration-cookbook">Migration Cookbook</h3>
<p>Here’s a migration cookbook, mostly for my own reference.</p>
<h4 id="prepare">Prepare</h4>
<p>Lower the DNS TTL to a minute about an hour prior to migration.</p>
<h4 id="migrate-data">Migrate Data</h4>
<ol>
<li>Stop the dokku container on the droplet with <code class="language-plaintext highlighter-rouge">dokku ps:stop app</code>.</li>
<li>Lock the app to prevent future accidental deployments with <code class="language-plaintext highlighter-rouge">dokku apps:lock app</code>.</li>
<li>Export data from MongoDB with <code class="language-plaintext highlighter-rouge">dokku mongo:export app > app.dump.gz</code>.</li>
<li>Fetch the data from the droplet and back it up with <code class="language-plaintext highlighter-rouge">scp root@domain:/path/to/data/app.dump.gz .</code>.</li>
<li>Restore data into the new managed MongoDB database.</li>
</ol>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash">mongorestore
<span class="nt">--uri</span> <span class="s2">"mongodb+srv://doadmin:password@db/admin?authSource=admin&replicaSet=db&tls=true"</span>
<span class="nt">--gzip</span>
<span class="nt">--archive</span><span class="o">=</span>app.dump.gz
<span class="nt">--nsInclude</span><span class="o">=</span>app.<span class="k">*</span><span class="sb">`</span>.</code></pre></figure>
<h4 id="create-an-app">Create an App</h4>
<ol>
<li>Set the new name to <code class="language-plaintext highlighter-rouge">app</code>.</li>
<li>Choose a GitHub repository for source code, grant permissions as needed.</li>
<li>Hit <code class="language-plaintext highlighter-rouge">Edit Plan</code>, reduce containers to 1, choose a $5 basic or $12/pro plan.</li>
<li>Hit <code class="language-plaintext highlighter-rouge">Add Resource</code>, and add a previously created MongoDB database, which adds a user with proper authorizations.</li>
<li>Edit environment settings. Copy them from <code class="language-plaintext highlighter-rouge">dokku config app</code> on the droplet. Remove <code class="language-plaintext highlighter-rouge">DATABASE_URL</code> that was added automatically, since it doesn’t include the right database name.</li>
<li>Set the MongoDB database URL <code class="language-plaintext highlighter-rouge">MONGO_URL: mongodb+srv://${db.USERNAME}:${db.PASSWORD}@${db.HOSTNAME}/app?authSource=admin&replicaSet=db&tls=true</code>.</li>
<li>Change a default app name to <code class="language-plaintext highlighter-rouge">app</code>.</li>
<li>Deploy the app.</li>
</ol>
<h4 id="finish">Finish</h4>
<p>Add a domain in app settings, update the DNS entry, re-increase back the DNS record TTL.</p>
<h3 id="cost-comparison">Cost Comparison</h3>
<p>My monthly server total was $134.39 ($96 for a s-8vcpu-16gb droplet, $4.89 for droplet snapshots, $19.20 for droplet backups, $10.00 for an external 100GB volume for MongoDB data, and $4.30 for volume snapshots).</p>
<p>Monthly app cost is $103 (5x$5 for basic apps, 4x$12 for pro, $30.00 for a shared 1gb-1vcpu-15gb MongoDB).</p>
<p>It’s actually cheaper to use apps than the droplet for roughly the same capacity and availability, minus having to manage infrastructure.</p>
<p>I think DigitalOcean apps are priced very well for my use-case. If you’ve never used the platform, sign up for an account using <a href="https://m.do.co/c/5b26011f9a9b">my referral link</a>, and thank you.</p>
<p><a href="https://code.dblock.org/2023/01/15/migrating-from-dokku-to-digital-ocean-apps.html">Migrating from Dokku to DigitalOcean Apps</a> was originally published by Daniel Doubrovkine at <a href="https://code.dblock.org">code.dblock.org | tech blog</a> on January 15, 2023.</p>https://code.dblock.org/2022/12/27/programming-languages2022-12-27T00:00:00+00:002022-12-27T00:00:00+00:00Daniel Doubrovkinehttps://code.dblock.orgdblock@dblock.org<p>I got stuck somewhere in British Columbia during the US “bomb” cyclone, with hours to spare. So I decided to finish <a href="/2022/07/11/making-sigv4-authenticated-requests-to-managed-opensearch.html">implementing samples that call OpenSearch with Sigv4 signing</a> in each of the 8 existing language clients.</p>
<p>All these do the same operation: make an instance of a client, query and display the server version, create an index called <code class="language-plaintext highlighter-rouge">movies</code>, insert a record into movies for the Bennett Miller’s 2011 “Moneyball”, search for “miller”, output the result, then cleanup by deleting the record, and then the empty index. Note that I have no idea why this specific film was chosen in all Elasticsearch documentation - I would have chosen Andrei Tarkovsky’s 1972 “Solaris”.</p>
<p>My only conclusion from this exercise is that the Go programming language is <a href="https://jesseduffield.com/Gos-Shortcomings-1/">objectively insane</a>.</p>
<table>
<thead>
<tr>
<th>client</th>
<th>lines of code</th>
</tr>
</thead>
<tbody>
<tr>
<td><a href="https://github.com/dblock/opensearch-go-client-demo">go</a></td>
<td>89</td>
</tr>
<tr>
<td><a href="https://github.com/dblock/opensearch-java-client-demo">java</a></td>
<td>80</td>
</tr>
<tr>
<td><a href="https://github.com/dblock/opensearch-rust-client-demo">rust</a></td>
<td>75</td>
</tr>
<tr>
<td><a href="https://github.com/dblock/opensearch-dotnet-client-demo">dotnet</a></td>
<td>68</td>
</tr>
<tr>
<td><a href="https://github.com/dblock/opensearch-python-client-demo">python</a></td>
<td>50</td>
</tr>
<tr>
<td><a href="https://github.com/dblock/opensearch-php-client-demo">php</a></td>
<td>50</td>
</tr>
<tr>
<td><a href="https://github.com/dblock/opensearch-node-client-demo">node</a></td>
<td>45</td>
</tr>
<tr>
<td><a href="https://github.com/dblock/opensearch-ruby-client-demo">ruby</a></td>
<td>44</td>
</tr>
</tbody>
</table>
<p><a href="https://code.dblock.org/2022/12/27/programming-languages.html">Programming Languages in 2022</a> was originally published by Daniel Doubrovkine at <a href="https://code.dblock.org">code.dblock.org | tech blog</a> on December 27, 2022.</p>https://code.dblock.org/2022/08/08/managing-github-notifications-in-gmail2022-08-08T00:00:00+00:002022-08-08T00:00:00+00:00Daniel Doubrovkinehttps://code.dblock.orgdblock@dblock.org<p>My e-mail inbox is flooded with GitHub notifications, just like yours.</p>
<p>Having tried half a dozen ways to get a queue of notifications, I settled on creating a filter and use it to label the email, and remove it from the inbox, the first time I receive a new notification for a repo I am subscribed to on GitHub.</p>
<p><img src="https://code.dblock.org/images/posts/2022/2022-08-08-managing-github-notifications-in-gmail/rules.gif" /></p>
<p>Then I check those emails in bulk and never miss anything.</p>
<p><img src="https://code.dblock.org/images/posts/2022/2022-08-08-managing-github-notifications-in-gmail/labels.gif" /></p>
<p>Click <a href="https://twitter.com/dblockdotorg/status/1452985177530064898">here</a> for the original tweet-sized version of the above.</p>
<p><a href="https://code.dblock.org/2022/08/08/managing-github-notifications-in-gmail.html">Managing GitHub Notifications in GMail</a> was originally published by Daniel Doubrovkine at <a href="https://code.dblock.org">code.dblock.org | tech blog</a> on August 08, 2022.</p>https://code.dblock.org/2022/07/11/making-sigv4-authenticated-requests-to-managed-opensearch2022-07-11T00:00:00+00:002022-07-11T00:00:00+00:00Daniel Doubrovkinehttps://code.dblock.orgdblock@dblock.org<p><a href="https://aws.amazon.com/opensearch-service/">Amazon OpenSearch</a> and <a href="https://aws.amazon.com/opensearch-service/features/serverless/">Amazon OpenSearch Serverless</a> use AWS SigV4 for authentication. We’ve made it dead easy to make authenticated requests across all OpenSearch clients in <a href="https://github.com/opensearch-project/opensearch-clients/issues/22">opensearch-clients#22</a>.</p>
<h3 id="command-line">Command Line</h3>
<h4 id="curl"><a href="https://curl.se/">curl</a></h4>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nb">export </span><span class="nv">AWS_ACCESS_KEY_ID</span><span class="o">=</span>...
<span class="nb">export </span><span class="nv">AWS_SECRET_ACCESS_KEY</span><span class="o">=</span>...
<span class="nb">export </span><span class="nv">AWS_SESSION_TOKEN</span><span class="o">=</span>...
curl <span class="se">\</span>
<span class="nt">--verbose</span> <span class="se">\</span>
<span class="nt">--request</span> GET <span class="s2">"https://...us-west-2.es.amazonaws.com"</span> <span class="se">\</span>
<span class="nt">--aws-sigv4</span> <span class="s2">"aws:amz:us-west-2:es"</span> <span class="se">\</span>
<span class="nt">--user</span> <span class="s2">"</span><span class="nv">$AWS_ACCESS_KEY_ID</span><span class="s2">:</span><span class="nv">$AWS_SECRET_ACCESS_KEY</span><span class="s2">"</span> <span class="se">\</span>
<span class="nt">-H</span> <span class="s2">"x-amz-security-token:</span><span class="nv">$AWS_SESSION_TOKEN</span><span class="s2">"</span></code></pre></figure>
<p>If you want to <code class="language-plaintext highlighter-rouge">PUT</code> a document with <code class="language-plaintext highlighter-rouge">curl</code> you need some data, and the <code class="language-plaintext highlighter-rouge">x-amz-content-sha256</code> header for Amazon OpenSearch Serverless. See <a href="https://gist.github.com/dblock/8dca2faba28a26e229676932763bd6c8#file-opensearch-curl-knn-sh">this gist</a> for a full example that inserts some vectors and perform an approximate nearest neighbor search.</p>
<h4 id="awscurl"><a href="https://github.com/okigan/awscurl">awscurl</a></h4>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nb">export </span><span class="nv">AWS_ACCESS_KEY_ID</span><span class="o">=</span>...
<span class="nb">export </span><span class="nv">AWS_SECRET_ACCESS_KEY</span><span class="o">=</span>...
<span class="nb">export </span><span class="nv">AWS_SESSION_TOKEN</span><span class="o">=</span>...
awscurl <span class="se">\</span>
<span class="s2">"https://search...us-west-2.es.amazonaws.com"</span> <span class="se">\</span>
<span class="nt">--region</span> us-west-2 <span class="se">\</span>
<span class="nt">--service</span> es</code></pre></figure>
<p>See <a href="https://gist.github.com/dblock/8dca2faba28a26e229676932763bd6c8#file-opensearch-awscurl-sh">this gist</a> for a full example that inserts some vectors and perform an approximate nearest neighbor search.</p>
<h4 id="aws-es-curl"><a href="https://github.com/joona/aws-es-curl">aws-es-curl</a></h4>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash">aws-es-curl <span class="se">\</span>
<span class="s2">"https://search...us-west-2.es.amazonaws.com"</span> <span class="se">\</span>
<span class="nt">--region</span> us-west-2</code></pre></figure>
<h3 id="java">Java</h3>
<h4 id="opensearch-java"><a href="https://github.com/opensearch-project/opensearch-java">opensearch-java</a></h4>
<p>Use <code class="language-plaintext highlighter-rouge">AwsSdk2Transport</code> introduced in opensearch-java 2.1.0. This is the latest recommended approach.</p>
<figure class="highlight"><pre><code class="language-java" data-lang="java"><span class="nc">SdkHttpClient</span> <span class="n">httpClient</span> <span class="o">=</span> <span class="nc">ApacheHttpClient</span><span class="o">.</span><span class="na">builder</span><span class="o">().</span><span class="na">build</span><span class="o">();</span>
<span class="nc">OpenSearchClient</span> <span class="n">client</span> <span class="o">=</span> <span class="k">new</span> <span class="nc">OpenSearchClient</span><span class="o">(</span>
<span class="k">new</span> <span class="nf">AwsSdk2Transport</span><span class="o">(</span>
<span class="n">httpClient</span><span class="o">,</span>
<span class="s">"search-...us-west-2.es.amazonaws.com"</span><span class="o">,</span>
<span class="nc">Region</span><span class="o">.</span><span class="na">US_WEST_2</span><span class="o">,</span>
<span class="nc">AwsSdk2TransportOptions</span><span class="o">.</span><span class="na">builder</span><span class="o">().</span><span class="na">build</span><span class="o">()</span>
<span class="o">)</span>
<span class="o">);</span>
<span class="nc">InfoResponse</span> <span class="n">info</span> <span class="o">=</span> <span class="n">client</span><span class="o">.</span><span class="na">info</span><span class="o">();</span>
<span class="nc">System</span><span class="o">.</span><span class="na">out</span><span class="o">.</span><span class="na">println</span><span class="o">(</span><span class="n">info</span><span class="o">.</span><span class="na">version</span><span class="o">().</span><span class="na">distribution</span><span class="o">()</span> <span class="o">+</span> <span class="s">": "</span> <span class="o">+</span> <span class="n">info</span><span class="o">.</span><span class="na">version</span><span class="o">().</span><span class="na">number</span><span class="o">());</span>
<span class="n">httpClient</span><span class="o">.</span><span class="na">close</span><span class="o">();</span></code></pre></figure>
<p>Working demo in <a href="https://github.com/dblock/opensearch-java-client-demo">opensearch-java-client-demo</a>.</p>
<h4 id="aws-request-signing-apache-interceptor"><a href="https://github.com/acm19/aws-request-signing-apache-interceptor">aws-request-signing-apache-interceptor</a></h4>
<p>Use an interceptor and any Apache REST client, including <code class="language-plaintext highlighter-rouge">RestHighLevelClient</code>.</p>
<figure class="highlight"><pre><code class="language-java" data-lang="java"><span class="nc">HttpRequestInterceptor</span> <span class="n">interceptor</span> <span class="o">=</span> <span class="k">new</span> <span class="nc">AwsRequestSigningApacheInterceptor</span><span class="o">(</span>
<span class="s">"es"</span><span class="o">,</span>
<span class="nc">Aws4Signer</span><span class="o">.</span><span class="na">create</span><span class="o">(),</span>
<span class="nc">DefaultCredentialsProvider</span><span class="o">.</span><span class="na">create</span><span class="o">(),</span>
<span class="nc">Region</span><span class="o">.</span><span class="na">US_WEST_2</span>
<span class="o">);</span>
<span class="nc">CloseableHttpClient</span> <span class="n">client</span> <span class="o">=</span> <span class="nc">HttpClients</span><span class="o">.</span><span class="na">custom</span><span class="o">()</span>
<span class="o">.</span><span class="na">addInterceptorLast</span><span class="o">(</span><span class="n">interceptor</span><span class="o">)</span>
<span class="o">.</span><span class="na">build</span><span class="o">();</span>
<span class="nc">HttpGet</span> <span class="n">httpGet</span> <span class="o">=</span> <span class="k">new</span> <span class="nc">HttpGet</span><span class="o">(</span><span class="s">"https://..."</span><span class="o">);</span>
<span class="nc">CloseableHttpResponse</span> <span class="n">httpResponse</span> <span class="o">=</span> <span class="n">httpClient</span><span class="o">.</span><span class="na">execute</span><span class="o">(</span><span class="n">httpGet</span><span class="o">);</span>
<span class="nc">System</span><span class="o">.</span><span class="na">out</span><span class="o">.</span><span class="na">println</span><span class="o">(</span><span class="n">httpResponse</span><span class="o">.</span><span class="na">getStatusLine</span><span class="o">());</span>
<span class="nc">System</span><span class="o">.</span><span class="na">out</span><span class="o">.</span><span class="na">println</span><span class="o">(</span><span class="nc">IoUtils</span><span class="o">.</span><span class="na">toUtf8String</span><span class="o">(</span><span class="n">response</span><span class="o">.</span><span class="na">getEntity</span><span class="o">().</span><span class="na">getContent</span><span class="o">()));</span></code></pre></figure>
<p>You can see a working demo in the <a href="https://github.com/acm19/aws-request-signing-apache-interceptor">interceptor code</a>. For an example that uses OpenSearch <code class="language-plaintext highlighter-rouge">RestHighLevelClient</code> see <a href="https://github.com/dblock/opensearch-java-client-demo/tree/opensearch-1.x">1.x</a> or <a href="https://github.com/dblock/opensearch-java-client-demo/tree/opensearch-2.x">2.x</a> depending on your version.</p>
<h3 id="ruby">Ruby</h3>
<h4 id="opensearch-ruby"><a href="https://github.com/opensearch-project/opensearch-ruby">opensearch-ruby</a></h4>
<p>Use <a href="https://rubygems.org/gems/opensearch-aws-sigv4">opensearch-aws-sigv4</a> 1.0 or newer.</p>
<figure class="highlight"><pre><code class="language-ruby" data-lang="ruby"><span class="n">signer</span> <span class="o">=</span> <span class="no">Aws</span><span class="o">::</span><span class="no">Sigv4</span><span class="o">::</span><span class="no">Signer</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span>
<span class="ss">service: </span><span class="s1">'es'</span><span class="p">,</span>
<span class="ss">region: </span><span class="s1">'us-west-2'</span><span class="p">,</span>
<span class="ss">access_key_id: </span><span class="no">ENV</span><span class="p">[</span><span class="s1">'AWS_ACCESS_KEY_ID'</span><span class="p">],</span>
<span class="ss">secret_access_key: </span><span class="no">ENV</span><span class="p">[</span><span class="s1">'AWS_SECRET_ACCESS_KEY'</span><span class="p">],</span>
<span class="ss">session_token: </span><span class="no">ENV</span><span class="p">[</span><span class="s1">'AWS_SESSION_TOKEN'</span><span class="p">]</span>
<span class="p">)</span>
<span class="n">client</span> <span class="o">=</span> <span class="no">OpenSearch</span><span class="o">::</span><span class="no">Aws</span><span class="o">::</span><span class="no">Sigv4Client</span><span class="p">.</span><span class="nf">new</span><span class="p">({</span>
<span class="ss">host: </span><span class="s1">'https://...'</span>
<span class="p">},</span> <span class="n">signer</span><span class="p">)</span>
<span class="n">info</span> <span class="o">=</span> <span class="n">client</span><span class="p">.</span><span class="nf">info</span>
<span class="nb">puts</span> <span class="n">info</span><span class="p">[</span><span class="s1">'version'</span><span class="p">][</span><span class="s1">'distribution'</span><span class="p">]</span> <span class="o">+</span> <span class="s1">': '</span> <span class="o">+</span> <span class="n">info</span><span class="p">[</span><span class="s1">'version'</span><span class="p">][</span><span class="s1">'number'</span><span class="p">]</span></code></pre></figure>
<p>Working demo in <a href="https://github.com/dblock/opensearch-ruby-client-demo">opensearch-ruby-client-demo</a>.</p>
<h3 id="nodejs">Node.js</h3>
<h4 id="opensearch-js"><a href="https://github.com/opensearch-project/opensearch-js">opensearch-js</a></h4>
<p>Use <a href="https://www.npmjs.com/package/@opensearch-project/opensearch">@opensearch-project/opensearch</a> 2.x.</p>
<figure class="highlight"><pre><code class="language-typescript" data-lang="typescript"><span class="kd">const</span> <span class="nx">client</span> <span class="o">=</span> <span class="k">new</span> <span class="nx">Client</span><span class="p">({</span>
<span class="p">...</span><span class="nx">AwsSigv4Signer</span><span class="p">({</span>
<span class="na">region</span><span class="p">:</span> <span class="nx">process</span><span class="p">.</span><span class="nx">env</span><span class="p">.</span><span class="nx">AWS_REGION</span> <span class="o">||</span> <span class="dl">'</span><span class="s1">us-east-1</span><span class="dl">'</span><span class="p">,</span>
<span class="na">getCredentials</span><span class="p">:</span> <span class="p">()</span> <span class="o">=></span> <span class="p">{</span>
<span class="kd">const</span> <span class="nx">credentialsProvider</span> <span class="o">=</span> <span class="nx">defaultProvider</span><span class="p">();</span>
<span class="k">return</span> <span class="nx">credentialsProvider</span><span class="p">();</span>
<span class="p">},</span>
<span class="p">}),</span>
<span class="na">node</span><span class="p">:</span> <span class="nx">process</span><span class="p">.</span><span class="nx">env</span><span class="p">.</span><span class="nx">OPENSEARCH_ENDPOINT</span>
<span class="p">});</span>
<span class="kd">var</span> <span class="nx">info</span> <span class="o">=</span> <span class="k">await</span> <span class="nx">client</span><span class="p">.</span><span class="nx">info</span><span class="p">();</span>
<span class="kd">var</span> <span class="nx">version</span> <span class="o">=</span> <span class="nx">info</span><span class="p">.</span><span class="nx">body</span><span class="p">.</span><span class="nx">version</span>
<span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="nx">version</span><span class="p">.</span><span class="nx">distribution</span> <span class="o">+</span> <span class="dl">"</span><span class="s2">: </span><span class="dl">"</span> <span class="o">+</span> <span class="nx">version</span><span class="p">.</span><span class="kr">number</span><span class="p">);</span></code></pre></figure>
<p>Working demo in <a href="https://github.com/dblock/opensearch-node-client-demo">opensearch-node-client-demo</a>.</p>
<h3 id="python">Python</h3>
<h4 id="opensearch-py"><a href="https://github.com/opensearch-project/opensearch-py">opensearch-py</a></h4>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">url</span> <span class="o">=</span> <span class="n">urlparse</span><span class="p">(</span><span class="n">environ</span><span class="p">[</span><span class="s">'OPENSEARCH_ENDPOINT'</span><span class="p">])</span>
<span class="n">region</span> <span class="o">=</span> <span class="n">environ</span><span class="p">.</span><span class="n">get</span><span class="p">(</span><span class="s">'AWS_REGION'</span><span class="p">,</span> <span class="s">'us-east-1'</span><span class="p">)</span>
<span class="n">credentials</span> <span class="o">=</span> <span class="n">Session</span><span class="p">().</span><span class="n">get_credentials</span><span class="p">()</span>
<span class="n">auth</span> <span class="o">=</span> <span class="n">AWSV4SignerAuth</span><span class="p">(</span><span class="n">credentials</span><span class="p">,</span> <span class="n">region</span><span class="p">)</span>
<span class="n">client</span> <span class="o">=</span> <span class="n">OpenSearch</span><span class="p">(</span>
<span class="n">hosts</span><span class="o">=</span><span class="p">[{</span>
<span class="s">'host'</span><span class="p">:</span> <span class="n">url</span><span class="p">.</span><span class="n">netloc</span><span class="p">,</span>
<span class="s">'port'</span><span class="p">:</span> <span class="n">url</span><span class="p">.</span><span class="n">port</span> <span class="ow">or</span> <span class="mi">443</span>
<span class="p">}],</span>
<span class="n">http_auth</span><span class="o">=</span><span class="n">auth</span><span class="p">,</span>
<span class="n">use_ssl</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span>
<span class="n">verify_certs</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span>
<span class="n">connection_class</span><span class="o">=</span><span class="n">RequestsHttpConnection</span>
<span class="p">)</span>
<span class="n">info</span> <span class="o">=</span> <span class="n">client</span><span class="p">.</span><span class="n">info</span><span class="p">()</span>
<span class="k">print</span><span class="p">(</span><span class="sa">f</span><span class="s">"</span><span class="si">{</span><span class="n">info</span><span class="p">[</span><span class="s">'version'</span><span class="p">][</span><span class="s">'distribution'</span><span class="p">]</span><span class="si">}</span><span class="s">: </span><span class="si">{</span><span class="n">info</span><span class="p">[</span><span class="s">'version'</span><span class="p">][</span><span class="s">'number'</span><span class="p">]</span><span class="si">}</span><span class="s">"</span><span class="p">)</span></code></pre></figure>
<p>Working demo in <a href="https://github.com/dblock/opensearch-python-client-demo">opensearch-python-client-demo</a>.</p>
<h3 id="dotnet">DotNet</h3>
<h4 id="opensearch-net"><a href="https://github.com/opensearch-project/opensearch-net">opensearch-net</a></h4>
<p>Use <a href="https://www.nuget.org/packages/OpenSearch.Client">OpenSearch.Client</a> 1.2.0 or newer.</p>
<figure class="highlight"><pre><code class="language-csharp" data-lang="csharp"><span class="kt">var</span> <span class="n">endpoint</span> <span class="p">=</span> <span class="k">new</span> <span class="nf">Uri</span><span class="p">(</span><span class="n">Environment</span><span class="p">.</span><span class="nf">GetEnvironmentVariable</span><span class="p">(</span><span class="s">"OPENSEARCH_ENDPOINT"</span><span class="p">)</span> <span class="p">??</span> <span class="k">throw</span> <span class="k">new</span> <span class="nf">ArgumentNullException</span><span class="p">(</span><span class="s">"Missing OPENSEARCH_ENDPOINT."</span><span class="p">));</span>
<span class="kt">var</span> <span class="n">region</span> <span class="p">=</span> <span class="n">Amazon</span><span class="p">.</span><span class="n">RegionEndpoint</span><span class="p">.</span><span class="nf">GetBySystemName</span><span class="p">(</span><span class="n">Environment</span><span class="p">.</span><span class="nf">GetEnvironmentVariable</span><span class="p">(</span><span class="s">"AWS_REGION"</span><span class="p">)</span> <span class="p">??</span> <span class="s">"us-east-1"</span><span class="p">);</span>
<span class="kt">var</span> <span class="n">connection</span> <span class="p">=</span> <span class="k">new</span> <span class="nf">AwsSigV4HttpConnection</span><span class="p">(</span><span class="n">region</span><span class="p">);</span>
<span class="kt">var</span> <span class="n">config</span> <span class="p">=</span> <span class="k">new</span> <span class="nf">ConnectionSettings</span><span class="p">(</span><span class="n">endpoint</span><span class="p">,</span> <span class="n">connection</span><span class="p">);</span>
<span class="kt">var</span> <span class="n">client</span> <span class="p">=</span> <span class="k">new</span> <span class="nf">OpenSearchClient</span><span class="p">(</span><span class="n">config</span><span class="p">);</span>
<span class="n">Console</span><span class="p">.</span><span class="nf">WriteLine</span><span class="p">(</span><span class="s">$"</span><span class="p">{</span><span class="n">client</span><span class="p">.</span><span class="nf">RootNodeInfo</span><span class="p">().</span><span class="n">Version</span><span class="p">.</span><span class="n">Distribution</span><span class="p">}</span><span class="s">: </span><span class="p">{</span><span class="n">client</span><span class="p">.</span><span class="nf">RootNodeInfo</span><span class="p">().</span><span class="n">Version</span><span class="p">.</span><span class="n">Number</span><span class="p">}</span><span class="s">"</span><span class="p">);</span></code></pre></figure>
<p>Working demo in <a href="https://github.com/dblock/opensearch-dotnet-client-demo">opensearch-dotnet-client-demo</a>.</p>
<h3 id="rust">Rust</h3>
<h4 id="opensearch-rs"><a href="https://docs.rs/opensearch/latest/opensearch/">opensearch-rs</a></h4>
<figure class="highlight"><pre><code class="language-rust" data-lang="rust"><span class="k">let</span> <span class="n">url</span> <span class="o">=</span> <span class="nn">Url</span><span class="p">::</span><span class="nf">parse</span><span class="p">(</span><span class="o">&</span><span class="nn">env</span><span class="p">::</span><span class="nf">var</span><span class="p">(</span><span class="s">"OPENSEARCH_ENDPOINT"</span><span class="p">)</span><span class="nf">.expect</span><span class="p">(</span><span class="s">"Missing OPENSEARCH_ENDPOINT"</span><span class="p">));</span>
<span class="k">let</span> <span class="n">conn_pool</span> <span class="o">=</span> <span class="nn">SingleNodeConnectionPool</span><span class="p">::</span><span class="nf">new</span><span class="p">(</span><span class="n">url</span><span class="o">?</span><span class="p">);</span>
<span class="k">let</span> <span class="n">aws_config</span> <span class="o">=</span> <span class="nn">aws_config</span><span class="p">::</span><span class="nf">load_from_env</span><span class="p">()</span><span class="k">.await</span><span class="nf">.clone</span><span class="p">();</span>
<span class="k">let</span> <span class="n">transport</span> <span class="o">=</span> <span class="nn">TransportBuilder</span><span class="p">::</span><span class="nf">new</span><span class="p">(</span><span class="n">conn_pool</span><span class="p">)</span><span class="nf">.auth</span><span class="p">(</span><span class="n">aws_config</span><span class="nf">.clone</span><span class="p">()</span><span class="nf">.try_into</span><span class="p">()</span><span class="o">?</span><span class="p">)</span><span class="nf">.build</span><span class="p">()</span><span class="o">?</span><span class="p">;</span>
<span class="k">let</span> <span class="n">client</span> <span class="o">=</span> <span class="nn">OpenSearch</span><span class="p">::</span><span class="nf">new</span><span class="p">(</span><span class="n">transport</span><span class="p">);</span>
<span class="k">let</span> <span class="n">info</span><span class="p">:</span> <span class="n">Value</span> <span class="o">=</span> <span class="n">client</span><span class="nf">.info</span><span class="p">()</span><span class="nf">.send</span><span class="p">()</span><span class="k">.await</span><span class="o">?</span><span class="nf">.json</span><span class="p">()</span><span class="k">.await</span><span class="o">?</span><span class="p">;</span>
<span class="nd">println!</span><span class="p">(</span><span class="s">"{}: {}"</span><span class="p">,</span> <span class="n">info</span><span class="p">[</span><span class="s">"version"</span><span class="p">][</span><span class="s">"distribution"</span><span class="p">]</span><span class="nf">.as_str</span><span class="p">()</span><span class="nf">.unwrap</span><span class="p">(),</span> <span class="n">info</span><span class="p">[</span><span class="s">"version"</span><span class="p">][</span><span class="s">"number"</span><span class="p">]</span><span class="nf">.as_str</span><span class="p">()</span><span class="nf">.unwrap</span><span class="p">());</span></code></pre></figure>
<p>Working demo in <a href="https://github.com/dblock/opensearch-rust-client-demo">opensearch-rust-client-demo</a>.</p>
<h3 id="php">PHP</h3>
<h4 id="opensearch-php"><a href="https://github.com/opensearch-project/opensearch-php">opensearch-php</a></h4>
<figure class="highlight"><pre><code class="language-php" data-lang="php"><span class="nv">$client</span> <span class="o">=</span> <span class="p">(</span><span class="k">new</span> <span class="err">\</span><span class="nf">OpenSearch\ClientBuilder</span><span class="p">())</span>
<span class="o">-></span><span class="nf">setHosts</span><span class="p">([</span><span class="nb">getenv</span><span class="p">(</span><span class="s2">"OPENSEARCH_ENDPOINT"</span><span class="p">)])</span>
<span class="o">-></span><span class="nf">setSigV4Region</span><span class="p">(</span><span class="nb">getenv</span><span class="p">(</span><span class="s2">"AWS_REGION"</span><span class="p">))</span>
<span class="o">-></span><span class="nf">setSigV4CredentialProvider</span><span class="p">(</span><span class="kc">true</span><span class="p">)</span>
<span class="o">-></span><span class="nf">build</span><span class="p">();</span>
<span class="nv">$info</span> <span class="o">=</span> <span class="nv">$client</span><span class="o">-></span><span class="nf">info</span><span class="p">();</span>
<span class="k">echo</span> <span class="s2">"</span><span class="si">{</span><span class="nv">$info</span><span class="p">[</span><span class="s1">'version'</span><span class="p">][</span><span class="s1">'distribution'</span><span class="p">]</span><span class="si">}</span><span class="s2">: </span><span class="si">{</span><span class="nv">$info</span><span class="p">[</span><span class="s1">'version'</span><span class="p">][</span><span class="s1">'number'</span><span class="p">]</span><span class="si">}</span><span class="se">\n</span><span class="s2">"</span><span class="p">;</span></code></pre></figure>
<p>Working demo in <a href="https://github.com/dblock/opensearch-php-client-demo">opensearch-php-client-demo</a>.</p>
<h3 id="go">Go</h3>
<h4 id="opensearch-go"><a href="https://github.com/opensearch-project/opensearch-go">opensearch-go</a></h4>
<figure class="highlight"><pre><code class="language-go" data-lang="go"><span class="n">ctx</span> <span class="o">:=</span> <span class="n">context</span><span class="o">.</span><span class="n">Background</span><span class="p">()</span>
<span class="n">cfg</span><span class="p">,</span> <span class="n">_</span> <span class="o">:=</span> <span class="n">config</span><span class="o">.</span><span class="n">LoadDefaultConfig</span><span class="p">(</span><span class="n">ctx</span><span class="p">)</span>
<span class="n">signer</span><span class="p">,</span> <span class="n">_</span> <span class="o">:=</span> <span class="n">requestsigner</span><span class="o">.</span><span class="n">NewSigner</span><span class="p">(</span><span class="n">cfg</span><span class="p">)</span>
<span class="n">endpoint</span><span class="p">,</span> <span class="n">_</span> <span class="o">:=</span> <span class="n">os</span><span class="o">.</span><span class="n">LookupEnv</span><span class="p">(</span><span class="s">"OPENSEARCH_ENDPOINT"</span><span class="p">)</span>
<span class="n">client</span><span class="p">,</span> <span class="n">_</span> <span class="o">:=</span> <span class="n">opensearch</span><span class="o">.</span><span class="n">NewClient</span><span class="p">(</span><span class="n">opensearch</span><span class="o">.</span><span class="n">Config</span><span class="p">{</span>
<span class="n">Addresses</span><span class="o">:</span> <span class="p">[]</span><span class="kt">string</span><span class="p">{</span><span class="n">endpoint</span><span class="p">},</span>
<span class="n">Signer</span><span class="o">:</span> <span class="n">signer</span><span class="p">,</span>
<span class="p">})</span>
<span class="k">if</span> <span class="n">info</span><span class="p">,</span> <span class="n">err</span> <span class="o">:=</span> <span class="n">client</span><span class="o">.</span><span class="n">Info</span><span class="p">();</span> <span class="n">err</span> <span class="o">!=</span> <span class="no">nil</span> <span class="p">{</span>
<span class="n">log</span><span class="o">.</span><span class="n">Fatal</span><span class="p">(</span><span class="s">"info"</span><span class="p">,</span> <span class="n">err</span><span class="p">)</span>
<span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
<span class="k">var</span> <span class="n">r</span> <span class="k">map</span><span class="p">[</span><span class="kt">string</span><span class="p">]</span><span class="k">interface</span><span class="p">{}</span>
<span class="n">json</span><span class="o">.</span><span class="n">NewDecoder</span><span class="p">(</span><span class="n">info</span><span class="o">.</span><span class="n">Body</span><span class="p">)</span><span class="o">.</span><span class="n">Decode</span><span class="p">(</span><span class="o">&</span><span class="n">r</span><span class="p">)</span>
<span class="n">version</span> <span class="o">:=</span> <span class="n">r</span><span class="p">[</span><span class="s">"version"</span><span class="p">]</span><span class="o">.</span><span class="p">(</span><span class="k">map</span><span class="p">[</span><span class="kt">string</span><span class="p">]</span><span class="k">interface</span><span class="p">{})</span>
<span class="n">fmt</span><span class="o">.</span><span class="n">Printf</span><span class="p">(</span><span class="s">"%s: %s</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">version</span><span class="p">[</span><span class="s">"distribution"</span><span class="p">],</span> <span class="n">version</span><span class="p">[</span><span class="s">"number"</span><span class="p">])</span>
<span class="p">}</span></code></pre></figure>
<p>Working demo in <a href="https://github.com/dblock/opensearch-go-client-demo">opensearch-go-client-demo</a>.</p>
<p><a href="https://code.dblock.org/2022/07/11/making-sigv4-authenticated-requests-to-managed-opensearch.html">Making AWS SigV4 Authenticated Requests to Amazon OpenSearch</a> was originally published by Daniel Doubrovkine at <a href="https://code.dblock.org">code.dblock.org | tech blog</a> on July 11, 2022.</p>https://code.dblock.org/2022/06/05/a-year-working-on-opensearch-2.02022-06-05T00:00:00+00:002022-06-05T00:00:00+00:00Daniel Doubrovkinehttps://code.dblock.orgdblock@dblock.org<p>I’ve now been at Amazon for 3 years, and it has been a year since I joined <a href="https://opensearch.org/">OpenSearch</a>, <a href="https://aws.amazon.com/blogs/opensource/introducing-opensearch/">a community-driven, open source fork of Elasticsearch and Kibana</a>. Last week we <a href="https://opensearch.org/blog/releases/2022/05/opensearch-2-0-is-now-available/">released OpenSearch 2.0</a>. Given that it’s already end of May, it’s time for my first blog post of 2022.</p>
<p>Many things are going really well.</p>
<p>The first question anyone asks me is whether I am writing any code. I’ve had <a href="https://github.com/pulls?q=is:pr+author:dblock+archived:false+user:opensearch-project+is:closed+is:merged">484 pull requests merged</a> into opensearch-project, out of <a href="https://github.com/pulls?q=is:pr+author:dblock+archived:false+user:opensearch-project+is:closed">573 pull requests raised</a>. (The fact that one out of five was not merged probably means that I don’t know what I am doing about 20% of the time.) There are tons of tiny bookkeeping changes, such as version increments, but there are also several meaty ones, half in <a href="https://github.com/opensearch-project/opensearch-build/pulls?q=is:pr+author:dblock+is:closed+is:merged">opensearch-build</a>. Turns out, continuously releasing two products (OpenSearch and OpenSearch Dashboards) with dozens of plug-ins each, for a big platform matrix (e.g. Linux, RPM/DEB, x64 and arm64), while working on 3 different releases simultaneously (right now 3.0, 2.1 and 1.3.4), along with half a dozen language clients (e.g. Java, JavaScript, Ruby, Go, and Rust) and integration tools (eg. Logstash or Fluentd), is not easy! We ended up writing a manifest-driven build/test/release automation framework in Python to enable a release train. It worked well, and whereas OpenSearch 1.0 took weeks to ship, we were able to cut 3 versions of the product during the log4j 0-day over a little longer than a week-end.</p>
<p>The confusion between Elasticsearch and OpenSearch <a href="https://venturebeat.com/2022/05/19/once-frenemies-elastic-and-aws-are-now-besties/">seems to have been cleared</a>, too. Occasionally, users will ask whether a new feature of Elasticsearch will be available in OpenSearch (you’re welcome to contribute features without looking at any non Apache-licensed code). And while OpenSearch will keep improving a thousand small ways to be a delightful, secure experience for everyone, the future of the fork is decidedly cloud-native.</p>
<p>What’s that all about? I work in the “Search Services” AWS organization, which builds and operates the <a href="https://aws.amazon.com/opensearch-service/">Amazon OpenSearch Service</a>. The folks that wrote the control plane for that service are very strong cloud engineers, and the scale of the service is remarkable. For example, in 2020 Pinterest was <a href="https://aws.amazon.com/solutions/case-studies/pinterest-elasticsearch-case-study/">ingesting</a> 1.7TB of data daily, growing to 3TB that year. Since then, data volumes haven’t grown exponentially, they have exploded. Hundreds of terabytes <em>per day</em> is no longer some crazy number in 2022, and you can draw a curve from there into the future. The big question now is not whether OpenSearch can support a few TB of data per day, but what does OpenSearch need to look like to support many hundreds, and how soon. We can no longer scale this monolith horizontally by adding more nodes, thus the future of OpenSearch is decidedly cloud-native. This doesn’t mean you must run it in the cloud, and much less on AWS. Simply put, cloud-native systems allow every aspect of the software to scale independently, the software is readily extensible, and easily multi-tenant. For example, scaling reads can happen independently of scaling writes, and search can be scaled independently from indexing. Plug-ins can run in isolation with clear, safe boundaries and interfaces, and don’t require a cluster restart. Data access is secure.</p>
<p>As an example of a cloud-native evolution, consider <a href="https://www.amazon.science/latest-news/amazon-redshift-ten-years-of-continuous-reinvention">Amazon Redshift</a>. Similarly, in OpenSearch we’ve embarked on <a href="https://github.com/opensearch-project/OpenSearch/issues/2095">a journey</a> towards rethinking extensibility, storage, indexing and search. While I have not written much (or any) code in these areas, I’ve spent many hours with various Engineers brainstorming and building an <a href="https://github.com/opensearch-project/OpenSearch/issues/2447">OpenSearch SDK</a> that will help decouple the engine from its extensions, <a href="https://github.com/opensearch-project/OpenSearch/issues/2578">refactoring and scaling storage</a>, starting with <a href="https://github.com/opensearch-project/OpenSearch/issues/2229">segment replication</a>, and much more. Most of these are not my ideas, but I believe I have been able to help folks feel safe making bigger bets, and aiming high, while staying pragmatic, and always writing code.</p>
<p>That said, don’t dismiss me too quickly as merely a cheerleader - I did help debug a customer problem in the managed service, and <a href="https://github.com/apache/lucene/pull/711">wrote a unit test in Lucene for the fix authored by a long time Lucene committer and Elasticsearch Engineer</a>.</p>
<p>Outside of code I like to persevere in areas where others would not.</p>
<p>I helped move <a href="https://github.com/opensearch-project/opensearch-plugin-template-java">opensearch-plugin-template-java</a> into the opensearch-project organization, while preserving the original author who doesn’t work for Amazon as an external repo administrator (a first for Amazon open-source), and worked through a process of adding external maintainers to OpenSearch project repos, merged as <a href="https://github.com/opensearch-project/.github/pull/59">opensearch-project/.github#59</a> with 197 comments. This paved the way for our <a href="https://github.com/opensearch-project/OpenSearch/pull/2905">first external maintainer</a> in OpenSearch core. In some ways these changes were hard (you know what I’m talking about if you’ve ever navigated a large organization with senior decision makers that own a significant P&L), and in other ways they were easy, because everyone at AWS wanted this. In practice, someone just needed to do it, removing obstacles one-by-one. I like this work and believe that enabling others always has much bigger impact vs. anything I could accomplish alone.</p>
<p>There are also some challenges.</p>
<p>I often hear that Amazon isn’t contributing enough to open-source, and I prefer to acknowledge that my colleagues and I can do more. So we do. As of today, I counted 191 out of 401 contributors to OpenSearch that don’t work for Amazon, two dozen Amazon contributors to Lucene, etc.</p>
<p>Across my larger organization, and AWS as a whole, open-source is still considered as an “upstream” activity. Engineers working in proprietary software tend to implement solutions in their territory, and then to open-source some parts (they never get enough time to do it). Doing open-source is perceived as, at the very least, a time-consuming “expense”, or at most a “risk”. Neither is actually true. Open-source is cheaper to write, and solves a number of real problems: it eases access to a more diverse group of experts, improves collaboration in code, creates higher quality software when done right, favors longer term product and design thinking, reduces staff attrition, and improves transparency and security. Open-source software, such as the Apache-licensed OpenSearch, powers many businesses and delivers real customer value to anyone who cares to run the software. Some then choose to invest their time and money into development, while retaining the freedom to do whatever they want with the results.</p>
<p>I’m excited for the rest of 2022 and the <a href="https://github.com/orgs/opensearch-project/projects/1">OpenSearch Roadmap</a>. See you at <a href="https://opensearch.org/events/2022-0921-opensearchcon/">OpenSearchCon in Seattle</a> September 21!</p>
<p><a href="https://code.dblock.org/2022/06/05/a-year-working-on-opensearch-2.0.html">A Year Working on OpenSearch (2.0)</a> was originally published by Daniel Doubrovkine at <a href="https://code.dblock.org">code.dblock.org | tech blog</a> on June 05, 2022.</p>https://code.dblock.org/2021/09/03/how-i-learned-rust-by-accident2021-09-03T00:00:00+00:002021-09-03T00:00:00+00:00Daniel Doubrovkinehttps://code.dblock.orgdblock@dblock.org<p>I had to quickly ramp up my Python over <a href="https://github.com/opensearch-project/opensearch-build/pulls?q=is%3Apr+is%3Aclosed+author%3Adblock">the past few weeks</a>. Mind, because Ruby is obviously better, I’ve never really written any Python in 20 years of programming. At least not production-grade Python with unit tests. Working on a real active codebase was the easiest and fastest way for me to learn, but I’m not telling you anything you didn’t already know.</p>
<p>Then yesterday, I accidentally “learned” Rust. I’m, obviously, still a total Rust noob, but at least now I know what <a href="https://doc.rust-lang.org/cargo/reference/manifest.html">Cargo and TOML</a> are, and I think I’m getting just sufficiently dangerous with it. Here’s the full story.</p>
<p>The OpenSearch project uses a link checker called <a href="https://github.com/lycheeverse/lychee">lychee</a> to ensure that links in the OpenSearch markdown docs work. The tool is open-source, and is written in <a href="https://www.rust-lang.org/">Rust</a>. Yesterday, the <a href="https://icu-project.org/">icu-project.org</a> website became a redirect, causing the link checker to <a href="https://github.com/opensearch-project/OpenSearch/issues/1199">fail</a>, blocking CI.</p>
<p>I started fixing CI by adding <code class="language-plaintext highlighter-rouge">icu-project.org</code> to the list of websites to exclude, and noticed that the GitHub action code that ran the link checker was already excluding a long list of URLs with <code class="language-plaintext highlighter-rouge">--exclude=website1 --exclude=website2 --exclude=...</code>. I read the Lychee documentation to see if it supported exclusion lists that could be stored in files. It didn’t, so I opened <a href="https://github.com/lycheeverse/lychee/issues/302">a feature request</a>. I was <a href="https://github.com/lycheeverse/lychee/issues/302#issuecomment-909599246">pointed</a> to the fact that Lychee supported config files, but I would still have to put exclusions into a long list.</p>
<p>CI couldn’t wait, but I was still not going to add a URL to a very long command-line. I devised a hack, and put the list of websites into a <code class="language-plaintext highlighter-rouge">.lycheeexclude</code> file, loaded the file into an environment variable with <code class="language-plaintext highlighter-rouge">LYCHEE_EXCLUDE=$(sed -e :a -e 'N;s/\n/ /;ta' .lycheeexclude)</code> inside the GitHub action code, used it with <code class="language-plaintext highlighter-rouge">--exclude ${{ env.LYCHEE_EXCLUDE }}</code>, and PRed this in <a href="https://github.com/opensearch-project/OpenSearch/pull/1189">OpenSearch#1189</a> and <a href="https://github.com/opensearch-project/OpenSearch/pull/1201">OpenSearch#1201</a>.</p>
<p>I decided to add the <code class="language-plaintext highlighter-rouge">--exclude-file</code> feature to Lychee and began by checking out Lychee code and tried to build it. After minimal Internet reading I learned that one needed <a href="https://rustup.rs/">rustup</a> to get started, as opposed to just installing Rust. I added that to the <a href="https://github.com/lycheeverse/lychee/blob/master/README.md#contributing-to-lychee">Lychee README</a> for the next noob like me, and was able to run tests with <code class="language-plaintext highlighter-rouge">cargo test</code>.</p>
<p>The CI code linter was now complaining, so <a href="https://github.com/lycheeverse/lychee/commit/c5d75447cad2a665e9bb126f2a04090ebd6df7f5">I fixed all</a> but <a href="https://github.com/lycheeverse/lychee/pull/304#issuecomment-911603614">one problem</a> and <a href="https://github.com/lycheeverse/lychee/pull/304#issuecomment-912158085">asked for help</a>. I was now able to run <code class="language-plaintext highlighter-rouge">cargo clippy</code> and get a clean code lint.</p>
<p>I finally copy-pasted code from the existing <code class="language-plaintext highlighter-rouge">--exclude</code> implementation into similarly looking code for <code class="language-plaintext highlighter-rouge">--exclude-file</code>, copy-pasted more code from Stack Overflow to read a file line-by-line, <a href="https://github.com/lycheeverse/lychee/pull/306#discussion_r701921275">wrote some missing tests for the existing</a> <code class="language-plaintext highlighter-rouge">--exclude</code> feature, added tests for the new <code class="language-plaintext highlighter-rouge">--exclude-file</code>, then <a href="https://github.com/lycheeverse/lychee/pull/306">submitted a pull request</a>. With my fixes above, CI was passing, except for a publish check.</p>
<p>It took me a while to comprehend that Lychee is actually a library called <code class="language-plaintext highlighter-rouge">lychee-lib</code> and a binary called <code class="language-plaintext highlighter-rouge">lychee-bin</code>, and that the publish check was trying to dry-run publishing the lib first, then the binary. The publishing dry-run was failing with an unresolved import, <code class="language-plaintext highlighter-rouge">error[E0432]: unresolved import lychee_lib::collector::Collector</code>. Ths looked suspicious, as the <code class="language-plaintext highlighter-rouge">Collector</code> code was recently added, breaking CI. I figured that the dry-run of the binary publication was picking up the previously released version of the lib, and not the current one. Incrementing the version in the source code of both the lib and the binary made this even more obvious as the publication dry-run couldn’t find the new version of the lib.</p>
<p>I Googled the problem, and discovered <a href="https://crates.io/crates/cargo-publish-all">cargo-publish-all</a> that was designed to address this exact scenario. However, that <a href="https://github.com/idanarye/rust-typed-builder/issues/57">failed</a> with an obscure <code class="language-plaintext highlighter-rouge">error[E0433]: failed to resolve: use of undeclared crate or module proc_macro</code> that came from <code class="language-plaintext highlighter-rouge">rust-typed-builder</code>, and has been <a href="https://gitlab.com/torkleyy/cargo-publish-all/-/issues/3">an open issue for over a year</a>. The error made no sense to anyone, but the maintainer of rust-type-builder was able to come up with <a href="https://github.com/idanarye/rust-typed-builder/issues/57#issuecomment-912802451">a workaround</a>. A new version of that library, 0.9.1, was also cut.</p>
<p>I made a <a href="https://github.com/lycheeverse/lychee/pull/309">final pull request</a> to the publish workflow and CI went back to green!</p>
<p>To summarize, Lychee now has a <code class="language-plaintext highlighter-rouge">--exclude-file</code> feature, and a working CI, while I got to learn Rust pretty much by accident. This would have never happened had I not be working in open-source by default. Oh, and it helped to be a bit persistent and not giving up on each one of the problems encountered above.</p>
<p>Similar accidents have generated defining moments in my career. Will I end up writing Rust full time one day? We shall see!</p>
<p><a href="https://code.dblock.org/2021/09/03/how-i-learned-rust-by-accident.html">How I Learned Rust by Accident</a> was originally published by Daniel Doubrovkine at <a href="https://code.dblock.org">code.dblock.org | tech blog</a> on September 03, 2021.</p>https://code.dblock.org/2021/09/03/generating-task-matrix-by-looping-over-repo-files-with-github-actions2021-09-03T00:00:00+00:002021-09-03T00:00:00+00:00Daniel Doubrovkinehttps://code.dblock.orgdblock@dblock.org<p>I’ve been having more fun with GitHub actions after <a href="/2021/08/13/automating-code-changes-with-github-actions-making-pull-requests.html">Automating Code Changes via GitHub Actions Making Pull Requests</a>. Let’s generate a job matrix from a list of files.</p>
<p>Why would I need that? In <a href="https://github.com/opensearch-project/opensearch-build/pull/386">opensearch-project/opensearch-build</a> we create manifest files that are used to produce an OpenSearch distribution. These files are created manually, one for every version. Each needs to be sanity-checked when created or changed.</p>
<p>These checks can be executed in parallel, so we can create a GitHub Actions matrix like so.</p>
<figure class="highlight"><pre><code class="language-yaml" data-lang="yaml"><span class="na">name</span><span class="pi">:</span> <span class="s">manifests</span>
<span class="na">on</span><span class="pi">:</span> <span class="pi">[</span><span class="nv">push</span><span class="pi">,</span> <span class="nv">pull_request</span><span class="pi">]</span>
<span class="na">jobs</span><span class="pi">:</span>
<span class="na">check</span><span class="pi">:</span>
<span class="na">runs-on</span><span class="pi">:</span> <span class="s">ubuntu-latest</span>
<span class="na">strategy</span><span class="pi">:</span>
<span class="na">matrix</span><span class="pi">:</span>
<span class="na">manifest</span><span class="pi">:</span>
<span class="pi">-</span> <span class="s">manifests/opensearch-1.1.0.yml</span>
<span class="pi">-</span> <span class="s">manifests/opensearch-1.0.0.yml</span>
<span class="na">steps</span><span class="pi">:</span>
<span class="pi">-</span> <span class="na">uses</span><span class="pi">:</span> <span class="s">actions/checkout@v2</span>
<span class="pi">-</span> <span class="na">run</span><span class="pi">:</span> <span class="pi">|</span>
<span class="s">./check-manifest $</span></code></pre></figure>
<p>We’ll definitely forget to update the matrix when a new file is created, so let’s just list those files dynamically, and generate a matrix from the list.</p>
<figure class="highlight"><pre><code class="language-yaml" data-lang="yaml"><span class="na">name</span><span class="pi">:</span> <span class="s">manifests</span>
<span class="na">on</span><span class="pi">:</span> <span class="pi">[</span><span class="nv">push</span><span class="pi">,</span> <span class="nv">pull_request</span><span class="pi">]</span>
<span class="na">jobs</span><span class="pi">:</span>
<span class="na">list-manifests</span><span class="pi">:</span>
<span class="na">runs-on</span><span class="pi">:</span> <span class="s">ubuntu-latest</span>
<span class="na">outputs</span><span class="pi">:</span>
<span class="na">matrix</span><span class="pi">:</span> <span class="s">${{ steps.set-matrix.outputs.matrix }}</span>
<span class="na">steps</span><span class="pi">:</span>
<span class="pi">-</span> <span class="na">uses</span><span class="pi">:</span> <span class="s">actions/checkout@v2</span>
<span class="pi">-</span> <span class="na">id</span><span class="pi">:</span> <span class="s">set-matrix</span>
<span class="na">run</span><span class="pi">:</span> <span class="s">echo "::set-output name=matrix::$(ls manifests/*.yml | jq -R -s -c 'split("\n")[:-1]')"</span>
<span class="na">check</span><span class="pi">:</span>
<span class="na">needs</span><span class="pi">:</span> <span class="s">list-manifests</span>
<span class="na">runs-on</span><span class="pi">:</span> <span class="s">ubuntu-latest</span>
<span class="na">strategy</span><span class="pi">:</span>
<span class="na">matrix</span><span class="pi">:</span>
<span class="na">manifest</span><span class="pi">:</span> <span class="s">${{ fromJson(needs.list-manifests.outputs.matrix) }}</span>
<span class="na">steps</span><span class="pi">:</span>
<span class="pi">-</span> <span class="na">uses</span><span class="pi">:</span> <span class="s">actions/checkout@v2</span>
<span class="pi">-</span> <span class="na">run</span><span class="pi">:</span> <span class="pi">|</span>
<span class="s">./check-manifest ${{ matrix.manifest }}</span></code></pre></figure>
<p>Here’s how this works.</p>
<ol>
<li>A shell command <code class="language-plaintext highlighter-rouge">ls manifests/*.yml</code> lists all .yml files.</li>
<li>A pipe to <code class="language-plaintext highlighter-rouge">| jq -R -s -c 'split("\n")[:-1]'</code> transforms the file list into a JSON array (from <a href="https://stackoverflow.com/questions/10234327/convert-bash-ls-output-to-json-array">StackOverflow#10234327</a>). Note that <a href="https://github.com/actions/runner-images/blob/436da67f4bd24acde9d9119870203f9dbbcf3bbe/images/ubuntu/Ubuntu2004-Readme.md">jq is installed on all GHA Linux images</a>.</li>
<li>The <code class="language-plaintext highlighter-rouge">matrix</code> output is set to the JSON array of files using <a href="https://docs.github.com/en/actions/reference/workflow-commands-for-github-actions#setting-an-output-parameter">set-output</a> with <code class="language-plaintext highlighter-rouge">echo "::set-output name=matrix::value</code>.</li>
<li>The <code class="language-plaintext highlighter-rouge">manifest</code> values are loaded from the JSON array using <a href="https://docs.github.com/en/actions/reference/context-and-expression-syntax-for-github-actions#fromjson">fromJson</a> and become part of the updated workflow.</li>
</ol>
<p>This is so awesome that it generates the matrix during the build!</p>
<p><img src="https://user-images.githubusercontent.com/542335/132070992-4e9ba64f-a8f4-4459-9102-95684de2cda7.png" alt="" /></p>
<h3 id="profit">Profit</h3>
<p>See <a href="https://github.com/opensearch-project/opensearch-build/pull/386">opensearch-project/opensearch-build#386</a> for a working example.</p>
<p><a href="https://code.dblock.org/2021/09/03/generating-task-matrix-by-looping-over-repo-files-with-github-actions.html">Generating a Task Matrix by Looping over Repo Files with GitHub Actions</a> was originally published by Daniel Doubrovkine at <a href="https://code.dblock.org">code.dblock.org | tech blog</a> on September 03, 2021.</p>https://code.dblock.org/2021/08/13/automating-code-changes-with-github-actions-making-pull-requests2021-08-13T00:00:00+00:002021-08-13T00:00:00+00:00Daniel Doubrovkinehttps://code.dblock.orgdblock@dblock.org<p>You’ve probably been depending on automated pull requests from <a href="https://dependabot.com/">Dependabot</a>, but how about making your own pull requests from GitHub actions? This capability can be used for automation that looks for changes, then updates files in your own repository with little to no additional setup needed.</p>
<ul>
<li>In <a href="https://github.com/dblock/lost-robbies/blob/master/.github/workflows/check-sales.yml">dblock/lost-robbies</a> the workflow checks for new sales and raises a PR after updating the JSON data and the <code class="language-plaintext highlighter-rouge">README.md</code>.</li>
<li>In <a href="https://github.com/opensearch-project/project-meta/blob/main/.github/workflows/check-repos.yml">opensearch-project/project-meta</a> the workflow enumerates public repositories in the opensearch-project organization and adds new repos to a <code class="language-plaintext highlighter-rouge">.meta</code> file.</li>
</ul>
<p>Below are some implementation details from <a href="https://github.com/opensearch-project/project-meta/blob/main/.github/workflows/check-repos.yml">opensearch-project/project-meta</a>.</p>
<h3 id="github-action-setup">GitHub Action Setup</h3>
<p>The job is executed on all changes to <code class="language-plaintext highlighter-rouge">main</code> and daily at midnight.</p>
<figure class="highlight"><pre><code class="language-yaml" data-lang="yaml"><span class="na">name</span><span class="pi">:</span> <span class="s">Check for new Project Repos</span>
<span class="na">on</span><span class="pi">:</span>
<span class="na">push</span><span class="pi">:</span>
<span class="na">branches</span><span class="pi">:</span>
<span class="pi">-</span> <span class="s">main</span>
<span class="na">schedule</span><span class="pi">:</span>
<span class="pi">-</span> <span class="na">cron</span><span class="pi">:</span> <span class="s2">"</span><span class="s">0</span><span class="nv"> </span><span class="s">0</span><span class="nv"> </span><span class="s">*</span><span class="nv"> </span><span class="s">*</span><span class="nv"> </span><span class="s">*"</span></code></pre></figure>
<h3 id="permissions-and-tokens">Permissions and Tokens</h3>
<p>The job checks out code, and needs a <code class="language-plaintext highlighter-rouge">GITHUB_TOKEN</code> in <code class="language-plaintext highlighter-rouge">env.</code> to make pull requests.</p>
<figure class="highlight"><pre><code class="language-yaml" data-lang="yaml"><span class="na">jobs</span><span class="pi">:</span>
<span class="na">check-project-repos</span><span class="pi">:</span>
<span class="na">runs-on</span><span class="pi">:</span> <span class="s">ubuntu-latest</span>
<span class="na">steps</span><span class="pi">:</span>
<span class="pi">-</span> <span class="na">uses</span><span class="pi">:</span> <span class="s">actions/checkout@v2</span>
<span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">Update project repositories</span>
<span class="na">env</span><span class="pi">:</span>
<span class="na">GITHUB_TOKEN</span><span class="pi">:</span> <span class="s">${{ secrets.GITHUB_TOKEN }}</span> </code></pre></figure>
<h3 id="generating-a-pr-title-and-body">Generating a PR Title and Body</h3>
<p>At first I was hard-coding PR titles and commit messages. That’s not ideal. Compare the following PRs. The second version is much more specific!</p>
<p><img src="https://user-images.githubusercontent.com/542335/129234923-42116ea8-dee6-4247-a904-35862d67919a.png" alt="" /></p>
<p><img src="https://user-images.githubusercontent.com/542335/129370221-380b84a1-e65d-4da5-8bb3-83a9277b946f.png" alt="" /></p>
<p>This can be achieved by setting an environment variable during the workflow execution by piping it into <code class="language-plaintext highlighter-rouge">$GITHUB_ENV</code>, and reusing it in the PR.</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nb">echo </span><span class="nv">REPOS_ADDED</span><span class="o">=</span><span class="si">$(</span>git diff <span class="nt">--unified</span><span class="o">=</span>0 .gitignore | <span class="nb">grep</span> <span class="s1">'+/'</span> | <span class="nb">cut</span> <span class="nt">-f2</span> <span class="nt">-d</span><span class="s1">'/'</span> | <span class="nb">paste</span> <span class="nt">-sd</span> <span class="s1">','</span> - | <span class="nb">sed</span> <span class="s2">"s/,/, /g"</span> | <span class="nb">sed</span> <span class="s1">'s/\(.*\),/\1 and/'</span><span class="si">)</span> <span class="o">>></span> <span class="nv">$GITHUB_ENV</span></code></pre></figure>
<ol>
<li>The workflow modifies <code class="language-plaintext highlighter-rouge">.gitignore</code> by adding lines to it, such as <code class="language-plaintext highlighter-rouge">/cross-cluster-replication/</code>.</li>
<li>Find all additions that start with <code class="language-plaintext highlighter-rouge">+/</code> using <code class="language-plaintext highlighter-rouge">| grep '+/'</code>.</li>
<li>Extract the name of each addition with <code class="language-plaintext highlighter-rouge">| cut -f2 -d'/'</code>, e.g. <code class="language-plaintext highlighter-rouge">cross-cluster-replication</code>.</li>
<li>Combine all additions into a comma-separated list with <code class="language-plaintext highlighter-rouge">| paste -sd ',' -</code>.</li>
<li>Add a space after each comma with <code class="language-plaintext highlighter-rouge">| sed "s/,/, /g" |</code>.</li>
<li>Replace the last comma by an <code class="language-plaintext highlighter-rouge">and</code> with <code class="language-plaintext highlighter-rouge">| sed 's/\(.*\),/\1 and/'</code>.</li>
<li>Pipe everything into <code class="language-plaintext highlighter-rouge">REPOS_ADDED=</code> with <code class="language-plaintext highlighter-rouge">echo REPOS_ADDED=$(...) >> $GITHUB_ENV</code>.</li>
</ol>
<h3 id="make-a-pull-request">Make a Pull Request</h3>
<figure class="highlight"><pre><code class="language-yaml" data-lang="yaml"><span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">Create Pull Request</span>
<span class="na">uses</span><span class="pi">:</span> <span class="s">peter-evans/create-pull-request@v3</span>
<span class="na">with</span><span class="pi">:</span>
<span class="na">commit-message</span><span class="pi">:</span> <span class="s">Added ${{ env.REPOS_ADDED }}.</span>
<span class="na">delete-branch</span><span class="pi">:</span> <span class="no">true</span>
<span class="na">title</span><span class="pi">:</span> <span class="s1">'</span><span class="s">Added</span><span class="nv"> </span><span class="s">${{</span><span class="nv"> </span><span class="s">env.REPOS_ADDED</span><span class="nv"> </span><span class="s">}}.'</span>
<span class="na">body</span><span class="pi">:</span> <span class="pi">|</span>
<span class="s">Added ${{ env.REPOS_ADDED }}.</span>
<span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">Check outputs</span>
<span class="na">run</span><span class="pi">:</span> <span class="pi">|</span>
<span class="s">echo "Pull Request Number - ${{ steps.cpr.outputs.pull-request-number }}"</span>
<span class="s">echo "Pull Request URL - ${{ steps.cpr.outputs.pull-request-url }}"</span></code></pre></figure>
<h3 id="profit">Profit</h3>
<p>See <a href="https://github.com/opensearch-project/project-meta/pull/7">opensearch-project/project-meta#7</a> for an example.</p>
<p><a href="https://code.dblock.org/2021/08/13/automating-code-changes-with-github-actions-making-pull-requests.html">Automating Code Changes via GitHub Actions Making Pull Requests</a> was originally published by Daniel Doubrovkine at <a href="https://code.dblock.org">code.dblock.org | tech blog</a> on August 13, 2021.</p>https://code.dblock.org/2021/06/15/running-github-actions-locally-using-act2021-06-15T00:00:00+00:002021-06-15T00:00:00+00:00Daniel Doubrovkinehttps://code.dblock.orgdblock@dblock.org<p>I’m a big fan of <a href="https://github.com/features/actions">GitHub Actions</a> to automate workflows. They are declarative in nature, developed as open-source components, and execution is container-based. I also recently learned that GitHub Actions were actually not tied to GitHub infrastructure, and can be executed locally, using <a href="https://github.com/nektos/act">act</a>.</p>
<p>Let’s build OpenSearch <a href="https://github.com/opensearch-project/job-scheduler">job-scheduler</a> on a local Linux.</p>
<h3 id="download-act">Download Act</h3>
<p>Download and install act from <a href="https://github.com/nektos/act#installation">here</a>. I just run the <code class="language-plaintext highlighter-rouge">install.sh</code> because YOLO.</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>curl https://raw.githubusercontent.com/nektos/act/master/install.sh | <span class="nb">sudo </span>bash
</code></pre></div></div>
<h3 id="check-out-jobscheduler">Check Out JobScheduler</h3>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>git checkout git@github.com:opensearch-project/job-scheduler.git
cd job-scheduler
</code></pre></div></div>
<h3 id="modify-workflow-temporary">Modify Workflow (Temporary)</h3>
<p>By default the <code class="language-plaintext highlighter-rouge">runner</code> user under which the workflow runs in the Docker container does not have write access to the current folder, therefore <a href="https://github.com/opensearch-project/job-scheduler/blob/v1.13.0.0/.github/workflows/test-and-build-workflow.yml">the workflow</a> as implemented requires one additional step. Add the following code locally in <code class="language-plaintext highlighter-rouge">.github/workflows/test-and-build-workflow.yml</code> after “Setup Java”.</p>
<div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">Chown</span>
<span class="na">run</span><span class="pi">:</span> <span class="pi">|</span>
<span class="s">sudo chown -R runner .</span>
</code></pre></div></div>
<h3 id="environment">Environment</h3>
<p>The current workflow implementation checks out OpenSearch and builds it, thus needing a token to <code class="language-plaintext highlighter-rouge">git clone</code> from GitHub. Create a <code class="language-plaintext highlighter-rouge">.secrets</code> file with a read-only GitHub token. This gets automatically picked up by act.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>GITHUB_TOKEN=valid-token
</code></pre></div></div>
<h3 id="invoke-act">Invoke Act</h3>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>act -r pull_request -P ubuntu-latest=catthehacker/ubuntu:runner-latest
</code></pre></div></div>
<p>Act uses an open-source Docker container to run the job. Enjoy a <code class="language-plaintext highlighter-rouge">BUILD SUCCESSFUL</code> result!</p>
<p><a href="https://code.dblock.org/2021/06/15/running-github-actions-locally-using-act.html">How to run GitHub Actions Locally using Act</a> was originally published by Daniel Doubrovkine at <a href="https://code.dblock.org">code.dblock.org | tech blog</a> on June 15, 2021.</p>https://code.dblock.org/2021/06/07/to-wrap-or-not-to-wrap-in-markdown2021-06-07T00:00:00+00:002021-06-07T00:00:00+00:00Daniel Doubrovkinehttps://code.dblock.orgdblock@dblock.org<p>I keep antagonizing OSS contributors trying to wrap text in Markdown files, e.g. <a href="https://github.com/opensearch-project/OpenSearch/pull/689#issuecomment-839241016">here</a> and <a href="https://github.com/opensearch-project/OpenSearch/pull/712#issuecomment-855271225">here</a>.</p>
<p>Should one wrap text in .markdown files at 80 columns or should one not?</p>
<p>First, let me say that I don’t care. Except that I do. Wrapped text in markdown really feeds my OCD in the worst possible way, right behind missing periods at the end of sentences, and two spaces. Oddly, I don’t care about tabs vs. spaces.</p>
<p>Here’s a logical argument for <em>not wrapping</em> text in markdown.</p>
<p>Markdown doesn’t use line the breaks: whether you include a line break in your markdown or not the rendered result is the same, unless you use 2 line breaks.</p>
<p>For example, consider the following text wrapped at 23 characters for illustration purposes.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>A quick brown fox jumps
over the lazy dog.
</code></pre></div></div>
<p>We swap “a” and “the”, producing the following new text.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>The quick brown fox
jumps over a lazy dog.
</code></pre></div></div>
<p>Because of a line wrap, this 2-word change is now a 2-line change. It hurts.</p>
<p>Without the wrap the diff would have been super clean.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>>A quick brown fox jumps over the lazy dog.
<The quick brown fox jumps over a lazy dog.
</code></pre></div></div>
<p>Furthermore, GitHub <a href="https://github.com/dblock/code.dblock.org/commit/99a4948a737ef21bf4025f3faa5b6167410e3de8">does an even better job at the 1-line diff</a>.</p>
<p><img src="https://code.dblock.org/images/posts/2021/2021-06-07-to-wrap-or-not-to-wrap-in-markdown/diff.png" alt="diff" /></p>
<p>Notice how the word “jumps” was highlighted, even though it wasn’t actually changed.</p>
<p>For an argument <em>for</em> wrapping text, see <a href="https://github.com/opensearch-project/OpenSearch/pull/712#issuecomment-855271225">this comment</a>.</p>
<p><a href="https://code.dblock.org/2021/06/07/to-wrap-or-not-to-wrap-in-markdown.html">To Wrap or Not to Wrap in Markdown?</a> was originally published by Daniel Doubrovkine at <a href="https://code.dblock.org">code.dblock.org | tech blog</a> on June 07, 2021.</p>https://code.dblock.org/2021/04/27/walking-ethereum-transaction-logs-to-find-lost-robbies-using-etherscan-api2021-04-27T00:00:00+00:002021-04-27T00:00:00+00:00Daniel Doubrovkinehttps://code.dblock.orgdblock@dblock.org<p>On July 17, 2018, I <a href="https://www.youtube.com/watch?v=KT-gPtK5uHY&t=4h13m20s">spoke</a> at the Christies first ever annual Tech Summit entitled “Exploring Blockchain”, in London. I even got a freebie NFT!</p>
<p>During the event <a href="https://superrare.com/">SuperRare</a> partnered with <a href="https://www.artnome.com/about-artnome">Jason Bailey</a> and enlisted <a href="https://robbiebarrat.github.io/">Robbie Barrat</a>, the first artist to ever tokenize on SuperRare. Robbie created “AI Generated Nude Portrait #7” for the event, which he intended as 300 separate frames of a single artwork. Each of the 300 frames was tokenized separately and added to redeemable ETH gift cards with directions for how to claim the 1/1 token.</p>
<p>A small handful of these original NFTs are known to still exist. We’ll call them “Robbies”. On April 5th, 2021, <a href="https://superrare.com/artwork/ai-generated-nude-portrait-7-frame-269-459">frame 269</a> sold for 125ETH ($265K).</p>
<p>If you enjoyed the forensics in <a href="https://digitalartcollector.com/rare-lost-robbie-ai-nude-nfts-worth-millions-surface/">Rare “Lost Robbie” AI Nude NFTs Worth Millions Surface</a>, or if you just want to know how SuperRare or other marketplaces display token history, this post is for you. We’ll walk the Ethereum blockchain transaction logs to find all the Robbies using <a href="https://www.npmjs.com/package/etherscan-api">etherscan-api</a> (<a href="https://etherscan.io/apis">Etherscan API</a>).</p>
<p>Please do note that I am no expert, and that I would greatly appreciate suggestions and fixes to my approach and <a href="https://github.com/dblock/lost-robbies">the code</a>.</p>
<p>An freebie OBJKT NFT was also minted, inspired by this project. Available at <a href="https://objkt.com/asset/hicetnunc/53103">hicetnunc.xyz/objkt/53103</a>.</p>
<h3 id="getting-started">Getting Started</h3>
<p>First, get an <code class="language-plaintext highlighter-rouge">ETHERSCAN_API_KEY</code> from <a href="https://etherscan.io/myapikey">Etherscan</a> and save it to a file called <code class="language-plaintext highlighter-rouge">.env</code>. We’ll use <a href="https://www.npmjs.com/package/dotenv">dotenv</a> to automatically load it, and initialize <code class="language-plaintext highlighter-rouge">EtherscanApi</code> with this key.</p>
<figure class="highlight"><pre><code class="language-typescript" data-lang="typescript"><span class="k">import</span> <span class="o">*</span> <span class="k">as</span> <span class="nx">dotenv</span> <span class="k">from</span> <span class="dl">'</span><span class="s1">dotenv</span><span class="dl">'</span><span class="p">;</span>
<span class="k">import</span> <span class="o">*</span> <span class="k">as</span> <span class="nx">EtherscanApi</span> <span class="k">from</span> <span class="dl">'</span><span class="s1">etherscan-api</span><span class="dl">'</span><span class="p">;</span>
<span class="kd">var</span> <span class="nx">api</span> <span class="o">=</span> <span class="kc">null</span><span class="p">;</span>
<span class="k">async</span> <span class="kd">function</span> <span class="nx">init</span><span class="p">()</span> <span class="p">{</span>
<span class="nx">dotenv</span><span class="p">.</span><span class="nx">config</span><span class="p">();</span>
<span class="kd">var</span> <span class="nx">etherscanApiKey</span> <span class="o">=</span> <span class="nx">process</span><span class="p">.</span><span class="nx">env</span><span class="p">.</span><span class="nx">ETHERSCAN_API_KEY</span><span class="p">;</span>
<span class="k">if</span> <span class="p">(</span><span class="o">!</span> <span class="nx">etherscanApiKey</span><span class="p">)</span> <span class="p">{</span> <span class="k">throw</span> <span class="k">new</span> <span class="nb">Error</span><span class="p">(</span><span class="dl">'</span><span class="s1">Missing ETHERSCAN_API_KEY</span><span class="dl">'</span><span class="p">)</span> <span class="p">}</span>
<span class="nx">api</span> <span class="o">=</span> <span class="nx">EtherscanApi</span><span class="p">.</span><span class="nx">init</span><span class="p">(</span><span class="nx">etherscanApiKey</span><span class="p">);</span>
<span class="p">}</span>
<span class="k">async</span> <span class="kd">function</span> <span class="nx">main</span><span class="p">()</span> <span class="p">{</span>
<span class="k">try</span> <span class="p">{</span>
<span class="k">await</span> <span class="nx">init</span><span class="p">();</span>
<span class="c1">// do something useful here</span>
<span class="p">}</span> <span class="k">catch</span><span class="p">(</span><span class="nx">error</span><span class="p">)</span> <span class="p">{</span>
<span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="nx">error</span><span class="p">)</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="nx">main</span><span class="p">();</span></code></pre></figure>
<p>The rest of the code goes somewhere into that <em>do something useful here</em> part above.</p>
<h3 id="who-is-robbie">Who is Robbie?</h3>
<p>Robbie Barrat, or <a href="https://superrare.com/videodrome">@videodrome</a> is <a href="https://etherscan.io/address/0x860c4604fe1125ea43f81e613e7afb2aa49546aa">0x860c4604fe1125ea43f81e613e7afb2aa49546aa</a>. I found that address by following transaction links on SuperRare.</p>
<figure class="highlight"><pre><code class="language-typescript" data-lang="typescript"><span class="kd">var</span> <span class="nx">balance</span> <span class="o">=</span> <span class="p">(</span><span class="k">await</span> <span class="nx">api</span><span class="p">.</span><span class="nx">account</span><span class="p">.</span><span class="nx">balance</span><span class="p">(</span><span class="dl">'</span><span class="s1">0x860c4604fe1125ea43f81e613e7afb2aa49546aa</span><span class="dl">'</span><span class="p">)).</span><span class="nx">result</span><span class="p">;</span>
<span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="dl">"</span><span class="s2">Robbie has </span><span class="dl">"</span> <span class="o">+</span> <span class="p">(</span><span class="nx">balance</span> <span class="o">/</span> <span class="mi">1000000000000000000</span><span class="p">).</span><span class="nx">toFixed</span><span class="p">(</span><span class="mi">2</span><span class="p">).</span><span class="nx">toString</span><span class="p">()</span> <span class="o">+</span> <span class="dl">"</span><span class="s2"> ETH</span><span class="dl">"</span><span class="p">);</span></code></pre></figure>
<p>This says Robbie has earned 118.66 ETH (~$317K) from sales so far. Not bad.</p>
<h3 id="superrare-contract">SuperRare Contract</h3>
<p>Ethereum transactions execute methods that are written in a <em>contract</em>, which is basically a bunch of code that implements a set of known methods (an interface, e.g. <a href="https://github.com/ethereum/eips/issues/721">ERC721</a>). As all code, method have inputs and outputs. The SuperRare contract is <a href="https://etherscan.io/address/0x41a322b28d0ff354040e2cbc676f0320d8c8850d">0x41a322b28d0ff354040e2cbc676f0320d8c8850d</a>, also found by examining a transaction linked from SuperRare.</p>
<p>Contracts are expressed in JSON, include method names, inputs, outputs, and other metadata. Contracts, along with all inputs and outputs on Ethereum, are encoded in binary format, using an application binary interface (ABI). We can fetch the contract with <code class="language-plaintext highlighter-rouge">contract.getabi</code>.</p>
<figure class="highlight"><pre><code class="language-typescript" data-lang="typescript"><span class="kd">var</span> <span class="nx">abi</span> <span class="o">=</span> <span class="k">await</span> <span class="nx">api</span><span class="p">.</span><span class="nx">contract</span><span class="p">.</span><span class="nx">getabi</span><span class="p">(</span><span class="dl">'</span><span class="s1">0x41a322b28d0ff354040e2cbc676f0320d8c8850d</span><span class="dl">'</span><span class="p">);</span>
<span class="kd">var</span> <span class="nx">json</span> <span class="o">=</span> <span class="nx">JSON</span><span class="p">.</span><span class="nx">parse</span><span class="p">(</span><span class="nx">abi</span><span class="p">.</span><span class="nx">result</span><span class="p">);</span></code></pre></figure>
<p>The ABI also gives you the ability to create an instance of a <a href="https://www.npmjs.com/package/ethereum-input-data-decoder">ethereum-input-data-decoder</a> to decode data in transactions that had been executed under this contract with <code class="language-plaintext highlighter-rouge">new InputDataDecoder(json)</code>.</p>
<h3 id="transactions-and-logs">Transactions and Logs</h3>
<p>Ethereum transactions are a series of method calls. Each transaction has an address, input arguments and output results. Each method call inside a transaction receives input, or <em>topics</em>, that can be <em>indexed</em>. A successful method call creates a log entry. Etherscan lets you query logs that belong to a certain contract using indexed topics.</p>
<p>For example, you can query logs for all method calls for the SuperRare contract.</p>
<figure class="highlight"><pre><code class="language-typescript" data-lang="typescript"><span class="kd">var</span> <span class="nx">logs</span> <span class="o">=</span> <span class="k">await</span> <span class="nx">api</span><span class="p">.</span><span class="nx">log</span><span class="p">.</span><span class="nx">getLogs</span><span class="p">(</span><span class="dl">'</span><span class="s1">0x41a322b28d0ff354040e2cbc676f0320d8c8850d</span><span class="dl">'</span><span class="p">);</span>
<span class="nx">logs</span><span class="p">.</span><span class="nx">result</span> <span class="c1">// first page of a lot of logs</span></code></pre></figure>
<h3 id="dude-wheres-my-nft">Dude, where’s my NFT?</h3>
<p>So, what’s the address of my NFT? Well, there isn’t one.</p>
<p>The SuperRare contract <em>mints</em> a new NFT as a side effect of a call to the <code class="language-plaintext highlighter-rouge">addNewToken</code> method. The <code class="language-plaintext highlighter-rouge">addNewToken</code> call increments <code class="language-plaintext highlighter-rouge">totalSupply()</code> of tokens to obtain a new token ID. By convention, this looks like a transfer from address <code class="language-plaintext highlighter-rouge">0x00</code> to the caller using the <code class="language-plaintext highlighter-rouge">Transfer</code> method.</p>
<p>Take a look at <a href="https://etherscan.io/tx/0x397cf219aadb0e25afc7fcbb35f36ebccd8611375b5c7ad888e4cbacced2d7ea">the first Nude Portrait #7 token creation transaction</a>. The transaction address was <code class="language-plaintext highlighter-rouge">0x397cf219aadb0e25afc7fcbb35f36ebccd8611375b5c7ad888e4cbacced2d7ea</code>. It called the <code class="language-plaintext highlighter-rouge">addNewToken</code> method with an <code class="language-plaintext highlighter-rouge">_uri</code> of <code class="language-plaintext highlighter-rouge">https://ipfs.pixura.io/ipfs/QmWkvzP1FZBrwBXjj3vD258RQm9MtV25G69zcqzYmc1cGd</code>, which contains the JSON of frame #1.</p>
<figure class="highlight"><pre><code class="language-json" data-lang="json"><span class="p">{</span><span class="w">
</span><span class="err">name:</span><span class="w"> </span><span class="s2">"AI Generated Nude Portrait #7 Frame #1"</span><span class="p">,</span><span class="w">
</span><span class="err">description:</span><span class="w"> </span><span class="s2">"Artwork generated by a GAN trained on thousands of nude portrait oil paintings."</span><span class="p">,</span><span class="w">
</span><span class="err">yearCreated:</span><span class="w"> </span><span class="s2">"2018"</span><span class="p">,</span><span class="w">
</span><span class="err">createdBy:</span><span class="w"> </span><span class="s2">"Robbie Barrat"</span><span class="p">,</span><span class="w">
</span><span class="err">tags:</span><span class="w"> </span><span class="p">[</span><span class="w">
</span><span class="s2">""</span><span class="err">Nude</span><span class="w"> </span><span class="err">Portrait</span><span class="s2">",
"</span><span class="err">AI</span><span class="s2">",
"</span><span class="err">Painting</span><span class="s2">",
"</span><span class="err">Portrait</span><span class="s2">",
"</span><span class="err">Generative</span><span class="s2">",
"</span><span class="err">GAN</span><span class="s2">",
"</span><span class="err">Machine</span><span class="w"> </span><span class="err">Learning</span><span class="s2">",
"</span><span class="err">Artificial</span><span class="w"> </span><span class="err">Intelligence</span><span class="s2">",
"</span><span class="err">Nude</span><span class="s2">",
"</span><span class="err">Abstract</span><span class="s2">""</span><span class="p">,</span><span class="w">
</span><span class="s2">""</span><span class="err">image.jpg</span><span class="s2">""</span><span class="w">
</span><span class="p">],</span><span class="w">
</span><span class="err">image:</span><span class="w"> </span><span class="s2">"https://ipfs.pixura.io/ipfs/QmaFkStftgA9rW9NyKUFyCKhAvKkENtD1CUfCkzEAWghyr"</span><span class="w">
</span><span class="p">}</span></code></pre></figure>
<p>The output of this transaction was a <code class="language-plaintext highlighter-rouge">_tokenId</code> of <code class="language-plaintext highlighter-rouge">191</code>. Navigating to <a href="https://superrare.com/artwork/191">superrare.co/artwork/191</a> will incidentally show you the first frame from “Nude Portrait #7”.</p>
<h3 id="finding-create-transactions">Finding Create Transactions</h3>
<p>As I described above, creating a token means calling <code class="language-plaintext highlighter-rouge">Transfer</code> from address <code class="language-plaintext highlighter-rouge">0x00</code>. The <code class="language-plaintext highlighter-rouge">tokenId</code> argument, however, is not indexed, so I could not find how to get the transaction that created, for example, token number 191. However, I figured out how to find all the transactions that generated the 300 tokens by specifying the source address of <code class="language-plaintext highlighter-rouge">0x00</code>.</p>
<figure class="highlight"><pre><code class="language-typescript" data-lang="typescript"><span class="nx">api</span><span class="p">.</span><span class="nx">log</span><span class="p">.</span><span class="nx">getLogs</span><span class="p">(</span>
<span class="dl">'</span><span class="s1">0x41a322b28d0ff354040e2cbc676f0320d8c8850d</span><span class="dl">'</span><span class="p">,</span> <span class="c1">// contract address</span>
<span class="dl">'</span><span class="s1">5977236</span><span class="dl">'</span><span class="p">,</span> <span class="c1">// fromBlock, from https://superrare.com/artwork/ai-generated-nude-portrait-7-frame-1-191</span>
<span class="dl">'</span><span class="s1">5977931</span><span class="dl">'</span><span class="p">,</span> <span class="c1">// toBlock, from https://superrare.com/artwork/ai-generated-nude-portrait-7-frame-300-490</span>
<span class="kc">null</span><span class="p">,</span>
<span class="kc">null</span><span class="p">,</span>
<span class="dl">'</span><span class="s1">0x0000000000000000000000000000000000000000000000000000000000000000</span><span class="dl">'</span> <span class="c1">// from at creation</span>
<span class="p">);</span></code></pre></figure>
<h3 id="examining-a-transaction">Examining a Transaction</h3>
<p>Now that we have a collection of 300 logs, we can, for each <code class="language-plaintext highlighter-rouge">log</code>, get the corresponding transactions, decode input data, identify the method called, etc.</p>
<figure class="highlight"><pre><code class="language-typescript" data-lang="typescript"><span class="kd">var</span> <span class="nx">tx</span> <span class="o">=</span> <span class="p">(</span><span class="k">await</span> <span class="nx">api</span><span class="p">.</span><span class="nx">proxy</span><span class="p">.</span><span class="nx">eth_getTransactionByHash</span><span class="p">(</span><span class="nx">log</span><span class="p">.</span><span class="nx">transactionHash</span><span class="p">)).</span><span class="nx">result</span><span class="p">;</span>
<span class="c1">// _uri: https://ipfs.pixura.io/ipfs/QmWkvzP1FZBrwBXjj3vD258RQm9MtV25G69zcqzYmc1cGd</span>
<span class="kd">const</span> <span class="nx">decodedInputData</span> <span class="o">=</span> <span class="nx">inputDataDecoder</span><span class="p">.</span><span class="nx">decodeData</span><span class="p">(</span><span class="nx">tx</span><span class="p">.</span><span class="nx">input</span><span class="p">);</span>
<span class="c1">// addNewToken</span>
<span class="kd">const</span> <span class="nx">method</span> <span class="o">=</span> <span class="nx">decodedInputData</span><span class="p">.</span><span class="nx">method</span><span class="p">;</span> </code></pre></figure>
<h3 id="first-transfers">First Transfers</h3>
<p>After creation, the tokens were transferred from @videodrome’s address into newly created wallets. Those transfers are indexed by the sender’s address. Again, because the <code class="language-plaintext highlighter-rouge">tokenId</code> is not indexed in these transactions, I couldn’t figure out how to query all transfer logs for a single token, but we can get the entire set.</p>
<figure class="highlight"><pre><code class="language-typescript" data-lang="typescript"><span class="nx">api</span><span class="p">.</span><span class="nx">log</span><span class="p">.</span><span class="nx">getLogs</span><span class="p">(</span>
<span class="dl">'</span><span class="s1">0x41a322b28d0ff354040e2cbc676f0320d8c8850d</span><span class="dl">'</span><span class="p">,</span> <span class="c1">// contract address</span>
<span class="dl">'</span><span class="s1">5977931</span><span class="dl">'</span><span class="p">,</span> <span class="c1">// fromBlock, from https://superrare.com/artwork/ai-generated-nude-portrait-7-frame-1-191</span>
<span class="dl">'</span><span class="s1">5979502</span><span class="dl">'</span><span class="p">,</span> <span class="c1">// toBlock, from https://superrare.com/artwork/ai-generated-nude-portrait-7-frame-300-490</span>
<span class="kc">null</span><span class="p">,</span>
<span class="kc">null</span><span class="p">,</span>
<span class="dl">'</span><span class="s1">0x000000000000000000000000860c4604fe1125ea43f81e613e7afb2aa49546aa</span><span class="dl">'</span> <span class="c1">// videodrome's address</span>
<span class="p">);</span></code></pre></figure>
<h3 id="sales-and-bids">Sales and Bids</h3>
<p>Reading the contract shows that <code class="language-plaintext highlighter-rouge">bid</code>, <code class="language-plaintext highlighter-rouge">acceptBid</code> and <code class="language-plaintext highlighter-rouge">buy</code> event logs are indexed by <code class="language-plaintext highlighter-rouge">tokenId</code> as the 3rd topic.</p>
<figure class="highlight"><pre><code class="language-typescript" data-lang="typescript"><span class="c1">// e.g. '0x0000000000000000000000000000000000000000000000000000000000000126'</span>
<span class="kd">var</span> <span class="nx">topic</span> <span class="o">=</span> <span class="dl">'</span><span class="s1">0x</span><span class="dl">'</span> <span class="o">+</span> <span class="nx">tokenId</span><span class="p">.</span><span class="nx">toString</span><span class="p">(</span><span class="mi">16</span><span class="p">).</span><span class="nx">padStart</span><span class="p">(</span><span class="mi">64</span><span class="p">,</span> <span class="dl">'</span><span class="s1">0</span><span class="dl">'</span><span class="p">);</span>
<span class="nx">api</span><span class="p">.</span><span class="nx">log</span><span class="p">.</span><span class="nx">getLogs</span><span class="p">(</span>
<span class="dl">'</span><span class="s1">0x41a322b28d0ff354040e2cbc676f0320d8c8850d</span><span class="dl">'</span><span class="p">,</span>
<span class="kc">null</span><span class="p">,</span> <span class="c1">// fromBlock</span>
<span class="kc">null</span><span class="p">,</span> <span class="c1">// toBlock</span>
<span class="kc">null</span><span class="p">,</span> <span class="c1">// topic0</span>
<span class="kc">null</span><span class="p">,</span> <span class="c1">// topic0_1_opr</span>
<span class="kc">null</span><span class="p">,</span> <span class="c1">// topic1</span>
<span class="kc">null</span><span class="p">,</span> <span class="c1">// topic1_2_opr</span>
<span class="kc">null</span><span class="p">,</span> <span class="c1">// topic2</span>
<span class="kc">null</span><span class="p">,</span> <span class="c1">// topic2_3_opr</span>
<span class="nx">topic</span><span class="p">,</span> <span class="c1">// topic3, tokenId</span>
<span class="kc">null</span>
<span class="p">);</span></code></pre></figure>
<p>Similarly, <code class="language-plaintext highlighter-rouge">setSalePrice</code> is indexed by <code class="language-plaintext highlighter-rouge">tokenId</code> as the 1st topic.</p>
<figure class="highlight"><pre><code class="language-typescript" data-lang="typescript"><span class="nx">api</span><span class="p">.</span><span class="nx">log</span><span class="p">.</span><span class="nx">getLogs</span><span class="p">(</span>
<span class="dl">'</span><span class="s1">0x41a322b28d0ff354040e2cbc676f0320d8c8850d</span><span class="dl">'</span><span class="p">,</span>
<span class="kc">null</span><span class="p">,</span> <span class="c1">// fromBlock</span>
<span class="kc">null</span><span class="p">,</span> <span class="c1">// toBlock</span>
<span class="kc">null</span><span class="p">,</span> <span class="c1">// topic0</span>
<span class="kc">null</span><span class="p">,</span> <span class="c1">// topic0_1_opr</span>
<span class="nx">topic</span> <span class="c1">// topic1</span>
<span class="p">);</span></code></pre></figure>
<p>Decoding inputs in these logs tells us, for example, the amount for the sale price set (<code class="language-plaintext highlighter-rouge">parseInt(log.topics[2], 16)</code>) or the transaction timestamp (<code class="language-plaintext highlighter-rouge">moment.unix(parseInt(log.timeStamp, 16))</code>).</p>
<h3 id="putting-it-all-together">Putting It All Together</h3>
<p>The complete code to this blog post is <a href="https://github.com/dblock/lost-robbies">here</a>. It fetches and stores the initial create transactions, subsequent transfer transactions, then all the sales transactions.</p>
<p>Run <code class="language-plaintext highlighter-rouge">npm run update</code> to fetch any new data updates, cached locally, and <code class="language-plaintext highlighter-rouge">npm run sales</code> to show the most recent sales.</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash">frame 13 sold <span class="k">for </span>100.888 ETH on Sat Apr 10 2021 00:40:21 GMT-0400 | https://superrare.com/artwork/ai-generated-nude-portrait-7-frame-13-203
frame 24 sold <span class="k">for </span>0.100 ETH on Fri Jul 20 2018 10:32:22 GMT-0400 | https://superrare.com/artwork/ai-generated-nude-portrait-7-frame-24-214
frame 44 was listed <span class="k">for </span>sale <span class="k">for </span>350.000 ETH on Sun Apr 25 2021 16:42:41 GMT-0400 | https://superrare.com/artwork/ai-generated-nude-portrait-7-frame-44-234
sold <span class="k">for </span>110.000 ETH on Mon Apr 19 2021 14:17:32 GMT-0400
frame 45 sold <span class="k">for </span>100.888 ETH on Fri Apr 09 2021 15:38:00 GMT-0400 | https://superrare.com/artwork/ai-generated-nude-portrait-7-frame-45-235
frame 53 was listed <span class="k">for </span>sale <span class="k">for </span>2500.000 ETH on Wed Mar 24 2021 21:44:31 GMT-0400 | https://superrare.com/artwork/ai-generated-nude-portrait-7-frame-53-243
frame 65 was listed <span class="k">for </span>sale <span class="k">for </span>545.000 ETH on Sun Apr 04 2021 22:14:41 GMT-0400 | https://superrare.com/artwork/ai-generated-nude-portrait-7-frame-65-255
sold <span class="k">for </span>47.000 ETH on Sun Apr 04 2021 08:32:42 GMT-0400
frame 78 was listed <span class="k">for </span>sale <span class="k">for </span>222.000 ETH on Fri Apr 23 2021 17:37:51 GMT-0400 | https://superrare.com/artwork/ai-generated-nude-portrait-7-frame-78-268
frame 92 was listed <span class="k">for </span>sale <span class="k">for </span>122.000 ETH on Mon Apr 26 2021 16:58:29 GMT-0400 | https://superrare.com/artwork/ai-generated-nude-portrait-7-frame-92-282
sold <span class="k">for </span>50.000 ETH on Mon Apr 05 2021 13:50:40 GMT-0400
frame 101 was listed <span class="k">for </span>sale <span class="k">for </span>5555.000 ETH on Fri Mar 12 2021 07:11:07 GMT-0500 | https://superrare.com/artwork/ai-generated-nude-portrait-7-frame-101-291
sold <span class="k">for </span>1.500 ETH on Mon Dec 02 2019 13:46:32 GMT-0500
frame 104 was listed <span class="k">for </span>sale <span class="k">for </span>1000.000 ETH on Mon Mar 15 2021 16:12:56 GMT-0400 | https://superrare.com/artwork/ai-generated-nude-portrait-7-frame-104-294
sold <span class="k">for </span>19.000 ETH on Thu Jun 11 2020 16:28:09 GMT-0400
frame 149 was listed <span class="k">for </span>sale <span class="k">for </span>888.000 ETH on Wed Mar 24 2021 08:29:05 GMT-0400 | https://superrare.com/artwork/ai-generated-nude-portrait-7-frame-149-339
sold <span class="k">for </span>35.000 ETH on Tue Aug 04 2020 02:06:33 GMT-0400
frame 153 sold <span class="k">for </span>16.500 ETH on Wed Jan 01 2020 15:37:56 GMT-0500 | https://superrare.com/artwork/ai-generated-nude-portrait-7-frame-153-343
frame 165 was listed <span class="k">for </span>sale <span class="k">for </span>2000.000 ETH on Mon Apr 05 2021 15:52:17 GMT-0400 | https://superrare.com/artwork/ai-generated-nude-portrait-7-frame-165-355
frame 166 was listed <span class="k">for </span>sale <span class="k">for </span>2200.000 ETH on Mon Apr 19 2021 22:24:50 GMT-0400 | https://superrare.com/artwork/ai-generated-nude-portrait-7-frame-166-356
sold <span class="k">for </span>80.000 ETH on Sat Apr 03 2021 03:56:19 GMT-0400
frame 175 sold <span class="k">for </span>0.001 ETH on Sat Jul 11 2020 23:54:29 GMT-0400 | https://superrare.com/artwork/ai-generated-nude-portrait-7-frame-175-365
sold <span class="k">for </span>0.001 ETH on Sat Jul 11 2020 23:37:43 GMT-0400
sold <span class="k">for </span>0.001 ETH on Sat Jul 11 2020 23:30:37 GMT-0400
sold <span class="k">for </span>0.001 ETH on Sat Jul 11 2020 23:14:22 GMT-0400
sold <span class="k">for </span>21.000 ETH on Mon Jun 29 2020 22:48:05 GMT-0400
sold <span class="k">for </span>1.500 ETH on Sat Dec 21 2019 00:09:56 GMT-0500
frame 179 was listed <span class="k">for </span>sale <span class="k">for </span>299.000 ETH on Mon Apr 12 2021 21:17:18 GMT-0400 | https://superrare.com/artwork/ai-generated-nude-portrait-7-frame-179-369
frame 206 sold <span class="k">for </span>60.000 ETH on Wed Apr 07 2021 15:40:30 GMT-0400 | https://superrare.com/artwork/ai-generated-nude-portrait-7-frame-206-396
frame 269 sold <span class="k">for </span>125.000 ETH on Mon Apr 05 2021 16:36:12 GMT-0400 | https://superrare.com/artwork/ai-generated-nude-portrait-7-frame-269-459
frame 275 was listed <span class="k">for </span>sale <span class="k">for </span>885.000 ETH on Mon Apr 19 2021 22:25:38 GMT-0400 | https://superrare.com/artwork/ai-generated-nude-portrait-7-frame-275-465
sold <span class="k">for </span>50.000 ETH on Wed Apr 07 2021 19:29:54 GMT-0400</code></pre></figure>
<p><a href="https://code.dblock.org/2021/04/27/walking-ethereum-transaction-logs-to-find-lost-robbies-using-etherscan-api.html">Walking Ethereum Transaction Logs to Find Lost Robbies w/Etherscan API</a> was originally published by Daniel Doubrovkine at <a href="https://code.dblock.org">code.dblock.org | tech blog</a> on April 27, 2021.</p>https://code.dblock.org/2021/04/16/adding-work-email-to-a-gpg-key-and-signing-git-commits2021-04-16T00:00:00+00:002021-04-16T00:00:00+00:00Daniel Doubrovkinehttps://code.dblock.orgdblock@dblock.org<p>Last week I joined the <a href="https://opensearch.org/">OpenSearch Team</a> at AWS, a community-driven, open source fork of Elasticsearch and Kibana (read more about it <a href="https://aws.amazon.com/blogs/opensource/introducing-opensearch/">here</a>).</p>
<p>Security is always our top priority at AWS, so I had to learn some new development best practices in this area. One of my colleagues, and Apache contributor <a href="https://github.com/nknize">@nknize</a> has been signing his commits with GPG. I decided to add my work e-mail address to my existing GPG key, and setup git signing as well.</p>
<h3 id="generating-keys">Generating Keys</h3>
<p>If you don’t already have a key, install <a href="https://gnupg.org/download/">gpg2</a> (e.g. <code class="language-plaintext highlighter-rouge">brew install gpg</code>), and follow the instructions in <a href="https://docs.github.com/en/github/authenticating-to-github/generating-a-new-gpg-key">this doc</a>. It will tell you to run <code class="language-plaintext highlighter-rouge">gpg --full-generate-key</code>.</p>
<p>You can list keys with <code class="language-plaintext highlighter-rouge">gpg --list-secret-keys --keyid-format LONG</code> and note the key ID.</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash">gpg <span class="nt">--list-secret-keys</span> <span class="nt">--keyid-format</span> LONG
/Users/dblock/.gnupg/pubring.kbx
<span class="nt">--------------------------------</span>
sec rsa2048/75BF031B7C94E183 2013-12-24 <span class="o">[</span>SC]
4A720FE790B07A68744E371675BF031B7C94E183
uid <span class="o">[</span>ultimate] Daniel Doubrovkine <dblock[at]dblock.org></code></pre></figure>
<p>In my example the key ID is <code class="language-plaintext highlighter-rouge">75BF031B7C94E183</code>.</p>
<h3 id="backing-up-keys">Backing up Keys</h3>
<p>I export and store a copy of my GPG keys in Dropbox and store the private key passphrase in 1Password. The latter is required to export or import a private key (gpg will prompt you).</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash">gpg <span class="nt">--export-secret-key</span> 75BF031B7C94E183 <span class="o">></span> 75BF031B7C94E183.gpg</code></pre></figure>
<h3 id="adding-my-work-e-mail">Adding my Work E-Mail</h3>
<p>I only have one identity, but multiple e-mails. I decided to add my work e-mail to my GPG key (YMMV) as explained <a href="https://docs.github.com/en/github/authenticating-to-github/associating-an-email-with-your-gpg-key">here</a>.</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash">gpg <span class="nt">--edit-key</span> 75BF031B7C94E183
<span class="nv">$ </span>gpg> adduid
<span class="c"># follow prompts, finish with `save`</span></code></pre></figure>
<p>My key now has both my personal and work e-mail addresses.</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nv">$ </span>gpg <span class="nt">--list-secret-keys</span> <span class="nt">--keyid-format</span> LONG
/Users/dblock/.gnupg/pubring.kbx
<span class="nt">--------------------------------</span>
sec rsa2048/75BF031B7C94E183 2013-12-24 <span class="o">[</span>SC]
4A720FE790B07A68744E371675BF031B7C94E183
uid <span class="o">[</span>ultimate] Daniel Doubrovkine <dblock[at]amazon.com>
uid <span class="o">[</span>ultimate] Daniel Doubrovkine <dblock[at]dblock.org>
ssb rsa2048/960955779E55310A 2013-12-24 <span class="o">[</span>E]</code></pre></figure>
<p>I then exported the public key with <code class="language-plaintext highlighter-rouge">gpg -a --export 3AA5C34371567BD2</code> and <a href="https://docs.github.com/en/authentication/managing-commit-signature-verification/adding-a-gpg-key-to-your-github-account">added it to my Github account</a>.</p>
<h3 id="signing-git-commits">Signing Git Commits</h3>
<p>I wanted to enable commit signing globally to avoid having to constantly appenad <code class="language-plaintext highlighter-rouge">-S</code> to <code class="language-plaintext highlighter-rouge">git commit</code>, and <a href="https://github.com/dblock/dotfiles/commit/073adde3335182ce33625951c84a8431adea8256">added the following settings to my dotfiles</a>.</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="c"># make GPG work</span>
<span class="nb">export </span><span class="nv">GPG_TTY</span><span class="o">=</span><span class="si">$(</span><span class="nb">tty</span><span class="si">)</span>
<span class="c"># use my key to sign all commits</span>
git config <span class="nt">--global</span> user.signingkey 75BF031B7C94E183
<span class="c"># automatically sign all commits</span>
git config <span class="nt">--global</span> commit.gpgsign <span class="nb">true</span></code></pre></figure>
<h3 id="checking-it-out">Checking it Out</h3>
<p>Commit signatures appear in <code class="language-plaintext highlighter-rouge">git log --show-signature</code>.</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash">~/source/dotfiles <span class="o">(</span>master<span class="o">)</span><span class="nv">$ </span>git log <span class="nt">--show-signature</span> <span class="nt">-1</span>
commit 073adde3335182ce33625951c84a8431adea8256 <span class="o">(</span>HEAD -> master, origin/master, origin/HEAD<span class="o">)</span>
gpg: Signature made Thu Apr 15 18:19:41 2021 EDT
gpg: using RSA key 4A720FE790B07A68744E371675BF031B7C94E183
gpg: Good signature from <span class="s2">"Daniel Doubrovkine <dblock[at]amazon.com>"</span> <span class="o">[</span>ultimate]
gpg: aka <span class="s2">"Daniel Doubrovkine <dblock[at]dblock.org>"</span> <span class="o">[</span>ultimate]
Author: dblock <dblock[at]amazon.com>
Date: Thu Apr 15 18:19:41 2021 <span class="nt">-0400</span>
Installing GPG keys.</code></pre></figure>
<p>And you can see a nice icon next to verified commits on GitHub!</p>
<p><img src="https://code.dblock.org/images/posts/2021/2021-04-16-adding-work-email-to-a-gpg-key-and-signing-git-commits/verified.gif" alt="verified" /></p>
<p>Now, how do I get verified <a href="https://twitter.com/dblockdotorg">on Twitter</a>?!</p>
<h3 id="passphrase">Passphrase</h3>
<p>I find it annoying to have to re-enter the passphrase every few minutes. Put the following into <code class="language-plaintext highlighter-rouge">~/.gnupg/gpg-agent.conf</code> to set the timeout to a day’s worth.</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash">default-cache-ttl 86400</code></pre></figure>
<p>Restart <code class="language-plaintext highlighter-rouge">gpgagent</code> with <code class="language-plaintext highlighter-rouge">gpgconf --kill gpg-agent</code>.</p>
<h3 id="new-computer">New Computer</h3>
<p>Import the key on a new computer.</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash">gpg <span class="nt">--import</span> ~/Dropbox/Personal/7C94E183.gpg
gpg <span class="nt">--import-ownertrust</span> < ~/Dropbox/Personal/7C94E183.trustlevel.txt
git config <span class="nt">--global</span> user.signingkey 75BF031B7C94E183
git config <span class="nt">--global</span> commit.gpgsign <span class="nb">true</span></code></pre></figure>
<p>If you get an error <code class="language-plaintext highlighter-rouge">gpg: no valid OpenPGP data found.</code> and <code class="language-plaintext highlighter-rouge">gpg: Total number processed: 0</code>, this is a very obtuse way for GPG to tell you the that contents of the file you’re trying to import is invalid. In my case <code class="language-plaintext highlighter-rouge">gpg --import ~/Dropbox/Personal/7C94E183.gpg</code> was failing because the file was not synced to my local drive from Dropbox.</p>
<h3 id="troubleshooting">Troubleshooting</h3>
<p>If you’re having trouble with gog, try <code class="language-plaintext highlighter-rouge">echo "test" | gpg --clearsign</code> to get a better error. If it complains that <code class="language-plaintext highlighter-rouge">gpg-agent</code> is not started, run <code class="language-plaintext highlighter-rouge">gpgagent</code> and correct any errors.</p>
<p><a href="https://code.dblock.org/2021/04/16/adding-work-email-to-a-gpg-key-and-signing-git-commits.html">Adding Work E-Mail to a GPG Key and Signing Git Commits</a> was originally published by Daniel Doubrovkine at <a href="https://code.dblock.org">code.dblock.org | tech blog</a> on April 16, 2021.</p>https://code.dblock.org/2021/03/20/how-i-minted-my-first-generative-art-nft2021-03-20T00:00:00+00:002021-03-20T00:00:00+00:00Daniel Doubrovkinehttps://code.dblock.orgdblock@dblock.org<p>I’ve long watched my inspiring friends make generative art and mint NFTs. I own a handful of works on paper by <a href="https://linktr.ee/dmitricherniak">Dmitri Cherniak</a>, and we’ve <a href="https://www.instagram.com/p/BtuX53IHMBx/">briefly collaborated in 2019</a> - Dmitri made digital works, and I drew a smaller set inspired by his output. I really enjoyed observing his process, and thought about trying making digital drawings myself, but then I stubbornly stuck to making my own works on paper. Dmitri’s recent success with <a href="https://opensea.io/collection/ringers-by-dmitri-cherniak">Ringers</a> selling at crazy prices was not overnight. He has long made, and believed in generative art, he’s a true artist that doesn’t care much about commercial success. Nevertheless, the $ outcomes are worthy of a mention, his art is now being recognized by collectors outside of the traditional gallery system.</p>
<p>Last week, after listening to a conversation with <a href="https://twitter.com/osinachiart">Osinachi</a>, an incredibly inspiring Nigerian artist that has been minting NFTs for a while, I finally decided to give generative art NFTs a go.</p>
<p>You can buy one of the works <a href="https://opensea.io/collection/generative-sanguines">here</a>, until they sell out. <strong>Update</strong>: works 1-6 have sold as of March 24th, 2021.</p>
<p><img src="https://raw.githubusercontent.com/dblock/p5art/master/shape.gif" alt="" /></p>
<p>Here’s the technicalities of how I minted my first NFTs.</p>
<ol>
<li>I started with <a href="https://github.com/Gaweph/p5-typescript-starter">p5-typescript-starter</a>, including <a href="https://github.com/Gaweph/p5-typescript-starter/pull/14">fixing a typo</a>.</li>
<li>Found inspiration in my own <a href="https://www.instagram.com/p/B9etMYGnMQ3/">existing paper drawing</a>.</li>
<li>Reproduced the sanguine (<code class="language-plaintext highlighter-rouge">color('#850505')</code>) shape <a href="https://github.com/dblock/p5art/blob/master/sketch/sketch.ts#L28">in code</a>, using <code class="language-plaintext highlighter-rouge">quad</code>.</li>
<li>Animated 10 frames and saved them to files using <code class="language-plaintext highlighter-rouge">saveCanvas</code>.</li>
<li>Created <a href="https://opensea.io/collection/generative-sanguines">a collection on OpenSea</a>. Each work is a unique result of a different frame.</li>
<li>Bought $200 worth of ETH on Coinbase, and moved it to OpenSea to pay gas on some wallet initialization process.</li>
<li>Listed <a href="https://opensea.io/assets/ethereum/0x495f947276749ce646f68ac8c248420045cb7b5e/48718886585399041049872855307944290111042886289234588241181420742469385977857">work 1/10</a> in an auction, and sold my first NFT to my first bidder, someone I don’t know!</li>
<li>In hindsight listing as an auction was a mistake because the seller ends up paying gas fees, which are high, at about $50 per transaction. I have since listed, and sold, works 2-6 as “buy now” for a fixed price. Buyer pays gas fee for those to complete.</li>
</ol>
<p><a href="https://code.dblock.org/2021/03/20/how-i-minted-my-first-generative-art-nft.html">How I Minted My First Generative Art NFT</a> was originally published by Daniel Doubrovkine at <a href="https://code.dblock.org">code.dblock.org | tech blog</a> on March 20, 2021.</p>