4

I'm using langchain to process a whole bunch of documents which are in an Mongo database.

I can load all documents fine into the chromadb vector storage using langchain. Nothing fancy being done here. This is my code:


from langchain.embeddings.openai import OpenAIEmbeddings
embeddings = OpenAIEmbeddings()

from langchain.vectorstores import Chroma
db = Chroma.from_documents(docs, embeddings, persist_directory='db')
db.persist()

Now, after storing the data, I want to get a list of all the documents and embeddings WITH id's.

This is so I can store them back into MongoDb.

I also want to put them through Bertopic to get the topic categories.

Question 1 is: how do I get all documents I've just stored in the Chroma database? I want the documents, and all the metadata.

Many thanks for your help!

user791793
  • 413
  • 1
  • 6
  • 19

1 Answers1

5

Looking at the source code (https://github.com/hwchase17/langchain/blob/master/langchain/vectorstores/chroma.py)

You can just call below

db.get()

and you will get a json output with the id's, embeddings and docs data.

carteakey
  • 66
  • 4