0

I'm learning Faiss and trying to build an IndexFlatIP quantizer for an IndexIVFFlat index with 4000000 arrays with d = 256.

My code is as follows:

import numpy as np
import faiss

d = 256 # Dimension of each feature vector
n = 4000000 # Number of vectors
cells = 100 # Number of Voronoi cells

embeddings = np.random.rand(n, d)

quantizer = faiss.IndexFlatIP(d)
index = faiss.IndexIVFFlat(quantizer, d, cells)
index.train(embeddings) # Train the index

The code above works great, but when it comes to adding the embeddings to the index:

index.add(embeddings)

I get the following exception:

numpy.core._exceptions._ArrayMemoryError: Unable to allocate 4.01 GiB for an array with shape (4000000, 256) and data type float32

Seeing as this is a numpy memory error, does it mean my index does not fit in memory? The machine I am using has 20.0GB. If so, how can I work around this issue and correctly configure my index so that it fits into memory?

Johnny
  • 117
  • 10
  • 4.01 GB seems suspiciously like a 32-bit limit. You can try [this solution](https://stackoverflow.com/a/22896826/17200348) if an array <4 GB _does_ work. – B Remmelzwaal Mar 30 '23 at 14:34
  • I have just double-checked that my Python is running as 64-bit with cmd: `python -c "import sys; print(sys.maxsize > 2**32)"` – Johnny Mar 30 '23 at 14:40
  • It's not about Python, but about NumPy. Have you checked the solution I linked? – B Remmelzwaal Mar 30 '23 at 14:43
  • Sorry, forgot to add: I have also checked that my `numpy` distribution (1.23.5) is running on 64 bits. `import numpy.distutils.system_info print(sysinfo.platform_bits)` – Johnny Mar 30 '23 at 14:46
  • It might be worth a shot checking the answers on [this question](https://stackoverflow.com/questions/57507832/unable-to-allocate-array-with-shape-and-data-type). – B Remmelzwaal Mar 30 '23 at 14:48
  • 1
    Indeed, it was a memory overcommitment problem which I could only solve by manually increasing the dynamic pagefile size. – Johnny Mar 30 '23 at 15:08

0 Answers0