10

I've a folder with multiple csv files, I'm trying to figure out a way to load them all into langchain and ask questions over all of them.

Here's what I have so far.

from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores import Chroma
from langchain.text_splitter import CharacterTextSplitter
from langchain import OpenAI, VectorDBQA
from langchain.document_loaders import DirectoryLoader
from langchain.document_loaders.csv_loader import CSVLoader
import magic
import os
import nltk

os.environ['OPENAI_API_KEY'] = '...'

loader = DirectoryLoader('../data/', glob='**/*.csv', loader_cls=CSVLoader)

documents = loader.load()

text_splitter = CharacterTextSplitter(chunk_size=400, chunk_overlap=0)

texts = text_splitter.split_documents(documents)

embeddings = OpenAIEmbeddings(openai_api_key=os.environ['OPENAI_API_KEY'])

docsearch = Chroma.from_documents(texts, embeddings)

qa = VectorDBQA.from_chain_type(llm=OpenAI(), chain_type="stuff", vectorstore=docsearch)

query = "how many females are present?"
qa.run(query)
Dave Kalu
  • 1,520
  • 3
  • 19
  • 38

3 Answers3

1

You should load them all into a vectorstore such as Pinecone or Metal. Then use a RetrievalQAChain or ConversationalRetrievalChain depending on if you want memory or not.

Sébastien Lavoie
  • 877
  • 1
  • 11
  • 18
Ismailp
  • 2,333
  • 4
  • 37
  • 66
0

Not sure whether you want to integrate multiple csv files for your query or compare among them. Here is the link if you want to compare/see the differences among multiple csv files using similar approach with querying one file. https://python.langchain.com/en/latest/modules/agents/toolkits/examples/csv.html

agent = create_csv_agent(OpenAI(temperature=0), ['titanic.csv', 'titanic_age_fillna.csv'], verbose=True)
agent.run("how many rows in the age column are different?")

screenshot for more details

  • What version of LangChain is that? with the last version I get an error when passing a list of files. – vaz May 28 '23 at 16:10
-1

I think what your code is for question-answering on txt file not csv.

for question-answering on csv file you can use

from langchain.agents import create_csv_agent
from langchain.llms import OpenAI

agent = create_csv_agent(OpenAI(temperature=0), ['titanic.csv', 'titanic_age_fillna.csv'], verbose=True)

you can ask questions to agent

query = "how many females are present?"
agent.run(query)
Dattatray
  • 120
  • 1
  • 5
  • 2
    this applies to a single csv file, which the OP presumably knew how to do. The question is about doing it across multiple csv files, which this answer failed to address. – onlyphantom May 02 '23 at 12:30
  • this doesn't answer the original question, which was - to read multiple files – Kiryl A. May 25 '23 at 12:26
  • for multiple csv files, **agent = create_csv_agent(OpenAI(temperature=0 ['titanic.csv', 'titanic_age_fillna.csv'], verbose=True)** – Dattatray May 29 '23 at 12:14