1

I have a DataFrame with a column that needs to be filled with values from a big csv file. What would be the best (possibly memory and computation efficient) way to load in the csv file and left-join the data to the DataFrame?

The approaches I have tried/considered:

  1. Load in the csv file as a DataFrame and use pandas functions to join: However, this fails with loading in the csv into the memory with a MemoryError.
  2. Load in the csv file in a database and use a left join query: I have not tried this one yet, but I hope to avoid a MemoryError.
nvrslnc
  • 349
  • 1
  • 5
  • 17
  • I think if need merge only in large data better is some DB, not pandas – jezrael Jan 14 '20 at 09:27
  • Does this answer your question? [Reading a huge .csv file](https://stackoverflow.com/questions/17444679/reading-a-huge-csv-file) – Kalana Jan 14 '20 at 09:54

1 Answers1

0

Probably an approach could be using dask and particularly dask read_csv.

In any case you may consider, as suggested by @jazreal, to store it in an SQL db.

Pierluigi
  • 1,048
  • 2
  • 9
  • 16