I am building a HuggingFace Space with Langchain (Gradio SDK) to chat my data, cloning from Harrison Chase's Chat Your Data space and going from there. Fixed a deprecation issue (see Discussion), switched to a DirectoryLoader so I can ingest multiple files, and want to use Chroma instead of FAISS.
I'm pretty new to this, so I'm trying to do as little changes as possible and want to keep using pickle as the original does, but use Chroma for the embedding rather than FAISS, like so when ingesting data:
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.document_loaders import DirectoryLoader
from langchain.vectorstores import Chroma
from langchain.embeddings import OpenAIEmbeddings
import pickle
# Load Data
loader = DirectoryLoader('./data/')
raw_documents = loader.load()
# Split text
text_splitter = RecursiveCharacterTextSplitter()
documents = text_splitter.split_documents(raw_documents)
# Load Data to vectorstore
embeddings = OpenAIEmbeddings()
vectorstore = Chroma.from_documents(documents=documents, embedding=embeddings)
# Save vectorstore
with open("vectorstore.pkl", "wb") as f:
pickle.dump(vectorstore, f)
This shouldn't be a problem, right? Or is pickle totally unnecessary? Anyways, my main question is below.
My Space runs but I get the error AttributeError: 'OpenAIEmbeddings' object has no attribute 'deployment'
when I go to finally chat my data. Here is the log:
...
File "/home/user/.local/lib/python3.11/site-packages/langchain/vectorstores/faiss.py", line 334, in similarity_search
docs_and_scores = self.similarity_search_with_score(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/.local/lib/python3.11/site-packages/langchain/vectorstores/faiss.py", line 275, in similarity_search_with_score
embedding = self.embedding_function(query)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/.local/lib/python3.11/site-packages/langchain/embeddings/openai.py", line 506, in embed_query
return self.embed_documents([text])[0]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/.local/lib/python3.11/site-packages/langchain/embeddings/openai.py", line 478, in embed_documents
return self._get_len_safe_embeddings(texts, engine=self.deployment)
^^^^^^^^^^^^^^^
AttributeError: 'OpenAIEmbeddings' object has no attribute 'deployment'
I see the log indicates it's using FAISS (.../langchain/vectorstores/faiss.py
), but I'm not even using FAISS, I'm using Chroma. This same error is given even when I am using FAISS, which is why I thought switching to Chroma altogether might solve the issue.
Then, when I remove faiss-cpu
from my requirements.txt
file, the Space no longer runs and I get the error
Traceback (most recent call last):
File "/home/user/app/app.py", line 17, in <module>
vectorstore = pickle.load(f)
^^^^^^^^^^^^^^
ModuleNotFoundError: No module named 'faiss'
Does anyone know why this is happening? Again, I'm very novice so I could be missing something basic. Should I get rid of pickle and use Chroma with a persist directory?
Thanks in advance.