Generate embeddings with nomic-embed-text-v1 using Elixir Bumblebee

TLDR; run the Livebook example by clicking the button above.

On February 1, 2024 Nomic released Nomic Embed. From their release announcement, Nomic Embed is a:

Open source
Open data
Open training code
Fully reproducible and auditable

text embedding model with a 8192 context-length that outperforms OpenAI Ada-002 and text-embedding-3-small on both short and long context tasks.

Using Nomic Embed with Elixir is surprisingly simple as it appears to be based on Bert. Bert is already well supported by Bumblebee.

First, ensure you are running the latest Bumblebee, Nx, and Exla dependencies:

Mix.install([
  {:bumblebee, "~> 0.4.2"},
  {:nx, "~> 0.6.1"},
  {:exla, "~> 0.6.1"}
])

Nx.global_default_backend({EXLA.Backend, client: :host})

Then, load the model and tokenizer from Huggingface. Note that we set the architecture and model options as Bumblebee isn't directly aware of the nomic-embed-text-v1 model.

{:ok, model_info} = Bumblebee.load_model({:hf, "nomic-ai/nomic-embed-text-v1"}, architecture: :base, module: Bumblebee.Text.Bert)
{:ok, tokenizer} = Bumblebee.load_tokenizer({:hf, "nomic-ai/nomic-embed-text-v1"}, module: Bumblebee.Text.BertTokenizer)

Then, creating a vector embedding is as simple as creating a text embedding serving and running it with our input.

text = """
George Washington (February 22, 1732 – December 14, 1799) was an American Founding Father, military officer, politician and statesman who served as the first president of the United States from 1789 to 1797.
Appointed by the Second Continental Congress as commander of the Continental Army in 1775, Washington led Patriot forces to victory in the American Revolutionary War and then served as president of the Constitutional Convention in 1787, which drafted and ratified the Constitution of the United States and established the U.S. federal government.
Washington has thus been known as the "Father of the Nation".
"""

serving = Bumblebee.Text.text_embedding(model_info, tokenizer)

Nx.Serving.run(serving, text)

This will return a map with an embedding key to an Nx tensor of the embedding.

%{
  embedding: #Nx.Tensor<
    f32[768]
    [-0.4294808506965637, 0.350296288728714, 0.669066846370697, 0.6496680378913879, -0.5590441823005676, 0.628896951675415, 0.7099022269248962, 0.4480874836444855, -0.6229736804962158, -0.8553215861320496, -0.10225728154182434, -0.7555021643638611, -0.6679865121841431, 0.3394457697868347, 0.2579677104949951, -0.5158786177635193, -0.5474927425384521, -0.13224446773529053, -0.7986775040626526, 0.09828327596187592, -0.2675127685070038, 0.0653139129281044, 0.40657204389572144, -0.7644062638282776, -0.8136388659477234, 0.9279033541679382, -0.07347262650728226, 0.6894499063491821, 0.28578901290893555, -0.3797470033168793, 0.31256935000419617, 0.4728080928325653, 0.025486616417765617, 0.8153523206710815, -0.2568066120147705, 0.79371178150177, -0.6752384305000305, 0.6208856105804443, 0.42518243193626404, 0.23898668587207794, 0.08520907163619995, -0.556420087814331, 0.34825894236564636, 0.8740851879119873, -0.8007180690765381, 0.05230526626110077, 0.133464977145195, 0.7196198105812073, -0.2740887701511383, ...]
  >
}

If running in production, be sure to read the docs for the text_embedding serving to maximize performance and consider batching.