How can I fetch all the vectors of a namespace in pinecone, as the fetch method expects the ids of the vectors. Is there any method to get all the ids of the vectors.
-
Please provide enough code so others can better understand or reproduce the problem. – Community May 31 '23 at 08:11
2 Answers
ok I have struggled with this alot but finally I have found the solution, I was just trying to make my first hello world with pinecone and I added some data and to make sure it is really upserted I wanted to get all vectors back from a namespace
Just make a query on pinecone and set topK
to its maximum and you will all vectors no matter what is your query
for example I have only 54 vectors in my pinecone index so if I set topK
to 100
it returns me all documents no matter what I give in query or I leave empty text in query,
here is my code for the reference sorry it is in ES Module (javascript) but I am sure it will work the same in python:
const queryPineconeIndex = async (queryText, numberOfResults) => {
const response = await openai.createEmbedding({
model: "text-embedding-ada-002",
input: queryText,
});
const vector = response?.data?.data[0]?.embedding
console.log("vector: ", vector);
// [ 0.0023063174, -0.009358601, 0.01578391, ... , 0.01678391, ]
const index = pinecone.Index(process.env.PINECONE_INDEX_NAME);
const queryResponse = await index.query({
queryRequest: {
vector: vector,
// id: "vec1",
topK: numberOfResults,
includeValues: true,
includeMetadata: true,
namespace: process.env.PINECONE_NAME_SPACE
}
});
queryResponse.matches.map(eachMatch => {
console.log(`score ${eachMatch.score.toFixed(1)} => ${JSON.stringify(eachMatch.metadata)}\n\n`);
})
console.log(`${queryResponse.matches.length} records found `);
}
queryPineconeIndex("any text or empty string", 100)
if you don't know how many vectors you have in an index you can also get it like this:
const getIndexStats = async () => {
const indexesList = await pinecone.listIndexes();
console.log("indexesList: ", indexesList);
const index = pinecone.Index(process.env.PINECONE_INDEX_NAME);
const indexStats = await index.describeIndexStats({
describeIndexStatsRequest: {
filter: {},
},
});
console.log("indexStats: ", indexStats);
}
// getIndexStats()
complete code in my github repo: https://github.com/mInzamamMalik/vector-database-hello-world

- 3,238
- 3
- 29
- 61
It's still pretty ridiculous that we have to do this but here is a workaround for getting all the ids so you can download all the vectors:
def get_ids_from_query(index,input_vector):
print("searching pinecone...")
results = index.query(vector=input_vector, top_k=10000,include_values=False)
ids = set()
print(type(results))
for result in results['matches']:
ids.add(result['id'])
return ids
def get_all_ids_from_index(index, num_dimensions, namespace=""):
num_vectors = index.describe_index_stats()["namespaces"][namespace]['vector_count']
all_ids = set()
while len(all_ids) < num_vectors:
print("Length of ids list is shorter than the number of total vectors...")
input_vector = np.random.rand(num_dimensions).tolist()
print("creating random vector...")
ids = get_ids_from_query(index,input_vector)
print("getting ids from a vector query...")
all_ids.update(ids)
print("updating ids set...")
print(f"Collected {len(all_ids)} ids out of {num_vectors}.")
return all_ids
all_ids = get_all_ids_from_index(index, num_dimensions=1536, namespace="")
print(all_ids)

- 65
- 7