Haystack + Pinecone Hybrid Vectors
Recently, Pinecone announced support for Sparse-dense embeddings, allowing for hybrid vector search. This is pretty awesome as it allows you to support both keyword-style queries that require exact matches (with the sparse vectors) and semantic queries that understand the intention of the query (with dense vectors). The two components of this hybrid vector are sent separately to Pinecone, which stores them as a unified vector. Your ANN results then incorporate distance from both components, with a configurable α to weigh one vs the other. The standard is to do: \( \text{dense} * \alpha + \text{sparse} * (1-\alpha) \).
I've recently been on a haystack
kick and it's fantastic; I find both the code and docs to be way higher quality than langchain
(though I am still falling back to langchain
for some complex agents). Sadly, haystack
doesn't currently support hybrid vectors. Since I wanted to get cracking playing with this, I built a little library called haystack-hybrid-embedding
which adds hybrid vector support to haystack
.
Just pip install haystack-hybrid-embedding
and you're off!
from haystack_hybrid_embedding import SpladeEmbeddingEncoder
from haystack_hybrid_embedding.pinecone import PineconeHybridDocumentStore, SparseDenseRetriever
document_store = PineconeHybridDocumentStore(...)
retriever = SparseDenseRetriever(
sparse_encoder=SpladeEmbeddingEncoder(),
alpha=0.8,
...
)
Hopefully we get native support in haystack
soon, but for now feel free to use haystack-hybrid-embedding
to bridge the gap.
Member discussion