Is there a method to fetch all the vectors of a namespace in pinecone

Question

How can I fetch all the vectors of a namespace in pinecone, as the fetch method expects the ids of the vectors. Is there any method to get all the ids of the vectors.

Please provide enough code so others can better understand or reproduce the problem. — Community, May 31 '23 at 08:11

score 3 · Accepted Answer · answered Jul 19 '23 at 17:16

ok I have struggled with this alot but finally I have found the solution, I was just trying to make my first hello world with pinecone and I added some data and to make sure it is really upserted I wanted to get all vectors back from a namespace

Just make a query on pinecone and set `topK` to its maximum and you will all vectors no matter what is your query

for example I have only 54 vectors in my pinecone index so if I set topK to 100 it returns me all documents no matter what I give in query or I leave empty text in query,

here is my code for the reference sorry it is in ES Module (javascript) but I am sure it will work the same in python:

const queryPineconeIndex = async (queryText, numberOfResults) => {

        const response = await openai.createEmbedding({
            model: "text-embedding-ada-002",
            input: queryText,
        });
        const vector = response?.data?.data[0]?.embedding
        console.log("vector: ", vector);
        // [ 0.0023063174, -0.009358601, 0.01578391, ... , 0.01678391, ]

        const index = pinecone.Index(process.env.PINECONE_INDEX_NAME);
        const queryResponse = await index.query({
            queryRequest: {
                vector: vector,
                // id: "vec1",
                topK: numberOfResults,
                includeValues: true,
                includeMetadata: true,
                namespace: process.env.PINECONE_NAME_SPACE
            }
        });

        queryResponse.matches.map(eachMatch => {
            console.log(`score ${eachMatch.score.toFixed(1)} => ${JSON.stringify(eachMatch.metadata)}\n\n`);
        })
        console.log(`${queryResponse.matches.length} records found `);
    }
    
    queryPineconeIndex("any text or empty string", 100)

if you don't know how many vectors you have in an index you can also get it like this:

const getIndexStats = async () => {

        const indexesList = await pinecone.listIndexes();
        console.log("indexesList: ", indexesList);

        const index = pinecone.Index(process.env.PINECONE_INDEX_NAME);
        const indexStats = await index.describeIndexStats({
            describeIndexStatsRequest: {
                filter: {},
            },
        });
        console.log("indexStats: ", indexStats);
    }
    // getIndexStats()

complete code in my github repo: https://github.com/mInzamamMalik/vector-database-hello-world

score 1 · Answer 2 · answered Jul 08 '23 at 21:20

It's still pretty ridiculous that we have to do this but here is a workaround for getting all the ids so you can download all the vectors:

def get_ids_from_query(index,input_vector):
  print("searching pinecone...")
  results = index.query(vector=input_vector, top_k=10000,include_values=False)
  ids = set()
  print(type(results))
  for result in results['matches']:
    ids.add(result['id'])
  return ids

def get_all_ids_from_index(index, num_dimensions, namespace=""):
  num_vectors = index.describe_index_stats()["namespaces"][namespace]['vector_count']
  all_ids = set()
  while len(all_ids) < num_vectors:
    print("Length of ids list is shorter than the number of total vectors...")
    input_vector = np.random.rand(num_dimensions).tolist()
    print("creating random vector...")
    ids = get_ids_from_query(index,input_vector)
    print("getting ids from a vector query...")
    all_ids.update(ids)
    print("updating ids set...")
    print(f"Collected {len(all_ids)} ids out of {num_vectors}.")

  return all_ids

all_ids = get_all_ids_from_index(index, num_dimensions=1536, namespace="")
print(all_ids)

Is there a method to fetch all the vectors of a namespace in pinecone

2 Answers2

Just make a query on pinecone and set topK to its maximum and you will all vectors no matter what is your query

complete code in my github repo: https://github.com/mInzamamMalik/vector-database-hello-world

Just make a query on pinecone and set `topK` to its maximum and you will all vectors no matter what is your query