3

I wanted to ask your opinion about saving ids with corresponding metadata. Think of the data I might have. I want to use many processes to add various vectors to FAISS in real-time. There is a need for storage to UUIDs to map those vector ids to corresponding UUIDs when it's needed. There may be several distinct vectors for one UUID. Real-time insertion is required, and it must be as quick as feasible to retrieve the appropriate UUID given the faiss index. When the data wasn't changing and there were around 2M, I used to use numpy arrays to map those IDs to their matching UUIDs. But this time, because real-time insertion will be used and there could be billions of vectors, I wanted to come up with a more useable method of storing those UUIDs. What do you think would be the best way to store this data?

  1. NZr9xeI0gu - [524.6 , 5.42, 7452.1,... ,124.6]
  2. NZr9xeI0gu - [10.8 , 7.02, 300.6,... ,785.0]
  3. NZr9xeI0gu - [485.0 , 504.0, 243.0,... ,5.09]
  4. GrM4dtQykW - [894.0 , 444.0, 0.00,... ,411.00]
  5. GrM4dtQykW - [9.0 , 845.0, 243.0,... ,850.79]
  6. VsgCjTNHxm - [0 , 174.0, 6.0, ... ,954.55]
  • I put the metadata in a json file whose filename is the vector id (splitted in subdirectories). However I'm far to be sure this is optimal. I'm still searching for an efficient way to associate medadata to faiss vectors. To some extend [milvus](https://milvus.io/) can do that, but this is a whole other story. – Laurent Claessens Sep 16 '22 at 08:28

0 Answers0