4

How can I fetch all the vectors of a namespace in pinecone, as the fetch method expects the ids of the vectors. Is there any method to get all the ids of the vectors.

Inzamam Malik
  • 3,238
  • 3
  • 29
  • 61

2 Answers2

3

ok I have struggled with this alot but finally I have found the solution, I was just trying to make my first hello world with pinecone and I added some data and to make sure it is really upserted I wanted to get all vectors back from a namespace

Just make a query on pinecone and set topK to its maximum and you will all vectors no matter what is your query

for example I have only 54 vectors in my pinecone index so if I set topK to 100 it returns me all documents no matter what I give in query or I leave empty text in query,

here is my code for the reference sorry it is in ES Module (javascript) but I am sure it will work the same in python:

const queryPineconeIndex = async (queryText, numberOfResults) => {

        const response = await openai.createEmbedding({
            model: "text-embedding-ada-002",
            input: queryText,
        });
        const vector = response?.data?.data[0]?.embedding
        console.log("vector: ", vector);
        // [ 0.0023063174, -0.009358601, 0.01578391, ... , 0.01678391, ]

        const index = pinecone.Index(process.env.PINECONE_INDEX_NAME);
        const queryResponse = await index.query({
            queryRequest: {
                vector: vector,
                // id: "vec1",
                topK: numberOfResults,
                includeValues: true,
                includeMetadata: true,
                namespace: process.env.PINECONE_NAME_SPACE
            }
        });

        queryResponse.matches.map(eachMatch => {
            console.log(`score ${eachMatch.score.toFixed(1)} => ${JSON.stringify(eachMatch.metadata)}\n\n`);
        })
        console.log(`${queryResponse.matches.length} records found `);
    }
    
    queryPineconeIndex("any text or empty string", 100)

if you don't know how many vectors you have in an index you can also get it like this:

const getIndexStats = async () => {

        const indexesList = await pinecone.listIndexes();
        console.log("indexesList: ", indexesList);

        const index = pinecone.Index(process.env.PINECONE_INDEX_NAME);
        const indexStats = await index.describeIndexStats({
            describeIndexStatsRequest: {
                filter: {},
            },
        });
        console.log("indexStats: ", indexStats);
    }
    // getIndexStats()

complete code in my github repo: https://github.com/mInzamamMalik/vector-database-hello-world

Inzamam Malik
  • 3,238
  • 3
  • 29
  • 61
1

It's still pretty ridiculous that we have to do this but here is a workaround for getting all the ids so you can download all the vectors:

def get_ids_from_query(index,input_vector):
  print("searching pinecone...")
  results = index.query(vector=input_vector, top_k=10000,include_values=False)
  ids = set()
  print(type(results))
  for result in results['matches']:
    ids.add(result['id'])
  return ids

def get_all_ids_from_index(index, num_dimensions, namespace=""):
  num_vectors = index.describe_index_stats()["namespaces"][namespace]['vector_count']
  all_ids = set()
  while len(all_ids) < num_vectors:
    print("Length of ids list is shorter than the number of total vectors...")
    input_vector = np.random.rand(num_dimensions).tolist()
    print("creating random vector...")
    ids = get_ids_from_query(index,input_vector)
    print("getting ids from a vector query...")
    all_ids.update(ids)
    print("updating ids set...")
    print(f"Collected {len(all_ids)} ids out of {num_vectors}.")

  return all_ids

all_ids = get_all_ids_from_index(index, num_dimensions=1536, namespace="")
print(all_ids)
Tyler S
  • 65
  • 7