0

I need to read a very large (~30GB) .csv file and query for specific values of my interest. I tested out the query code on a small dummy file and it worked, but I get a memory error message when I tried on the actual large file. I think the strategy is to not have all the data read at once, rather processing it in chunks, but I have no experience coding so I don't know how to do it.

Here's my reading in the very large data and then query function:

synapses = pd.read_csv('c:/Users/anhdu/OneDrive/Desktop/Synapses FlyWire/flywire_buhman_wiring_v7.csv')
synapses_wanted = synapses.query('pre_pt_root_id == 720575940631147000 & post_pt_root_id == 720575940622342000 ')

I'm wondering if somebody could please help me with example codes to do the above, but in chunks so that my computer can handle it. The file has ~30 million rows. Many thanks!

0 Answers0