can I 'inner-search' most similar vectors within a FAISS index?

Question

I have a FAISS index populated with 8M embedding vectors. I don't have the embedding vectors anymore, only the index, and it is expensive to recompute the embeddings.

Can I search the index for the top-k most similar vectors to each of the index's vectors?

To be more concrete, say this is how my index was populated:

d = 1024
N = 100
embeddings = np.random.rand(N, d)
ids = range(N)
index = faiss.index_factory(
    d, 'IDMap,Flat', faiss.METRIC_INNER_PRODUCT
)
index.add_with_ids(embeddings, ids)

I would like to get D, I such that:

D, I = index.search(embeddings, k)

but I don't have access to embeddings anymore, I only have the index.

I tried using index.reconstruct() to get back my (approximated?) embeddings but I run into

RuntimeError: Error in virtual void 
faiss::Index::reconstruct(faiss::Index::idx_t, float*) const at /root/miniconda3/conda-bld/faiss-pkg_1613228717761/work/faiss/Index.cpp:57: reconstruct not implemented for this type of index

Fedor · Answer 1 · 2022-11-17T10:10:16.973

0

First of all seems like you forgot train() your embeddings before add() it.

What is about your question you can just copy embeddings before adding it into the index.

edited Nov 17 '22 at 10:10

answered Nov 17 '22 at 10:09

Fedor

19
4

I did train the embeddings of course, and that is the expensive part that I don't want to do again. That is why I was hoping to be able to search top similarities 'within' the index without having to compute the embeddings again. The code that I posted uses random embeddings but that is just to show how I have populated the index – user7980 Nov 17 '22 at 10:30

can I 'inner-search' most similar vectors within a FAISS index?

1 Answers1