2

I have a FAISS index populated with 8M embedding vectors. I don't have the embedding vectors anymore, only the index, and it is expensive to recompute the embeddings.

Can I search the index for the top-k most similar vectors to each of the index's vectors?

To be more concrete, say this is how my index was populated:

d = 1024
N = 100
embeddings = np.random.rand(N, d)
ids = range(N)
index = faiss.index_factory(
    d, 'IDMap,Flat', faiss.METRIC_INNER_PRODUCT
)
index.add_with_ids(embeddings, ids)

I would like to get D, I such that:

D, I = index.search(embeddings, k) 

but I don't have access to embeddings anymore, I only have the index.

I tried using index.reconstruct() to get back my (approximated?) embeddings but I run into

RuntimeError: Error in virtual void 
faiss::Index::reconstruct(faiss::Index::idx_t, float*) const at /root/miniconda3/conda-bld/faiss-pkg_1613228717761/work/faiss/Index.cpp:57: reconstruct not implemented for this type of index
STerliakov
  • 4,983
  • 3
  • 15
  • 37
user7980
  • 71
  • 2

1 Answers1

0

First of all seems like you forgot train() your embeddings before add() it.

What is about your question you can just copy embeddings before adding it into the index.

Fedor
  • 19
  • 4
  • I did train the embeddings of course, and that is the expensive part that I don't want to do again. That is why I was hoping to be able to search top similarities 'within' the index without having to compute the embeddings again. The code that I posted uses random embeddings but that is just to show how I have populated the index – user7980 Nov 17 '22 at 10:30