Best way to use big csv file as lookup to fill data in DataFrame

Question

I have a DataFrame with a column that needs to be filled with values from a big csv file. What would be the best (possibly memory and computation efficient) way to load in the csv file and left-join the data to the DataFrame?

The approaches I have tried/considered:

Load in the csv file as a DataFrame and use pandas functions to join: However, this fails with loading in the csv into the memory with a MemoryError.
Load in the csv file in a database and use a left join query: I have not tried this one yet, but I hope to avoid a MemoryError.

I think if need merge only in large data better is some DB, not pandas — jezrael, Jan 14 '20 at 09:27
Does this answer your question? [Reading a huge .csv file](https://stackoverflow.com/questions/17444679/reading-a-huge-csv-file) — Kalana, Jan 14 '20 at 09:54

score 0 · Answer 1 · answered Jan 14 '20 at 10:10

0

Probably an approach could be using dask and particularly dask read_csv.

In any case you may consider, as suggested by @jazreal, to store it in an SQL db.

answered Jan 14 '20 at 10:10

Pierluigi

1,048
2
9
16

Best way to use big csv file as lookup to fill data in DataFrame

1 Answers1

Linked