0

I've got relatively large CSV files (~3GB each) and am trying to read them in as a pandas dataframe, but am unable to do this for some strange reason.

When I read it in via VisualStudio, the editor just restarts after a while. On Atom, it just hangs, freezes, and ultimately crashes. On terminal, I get "Killed: 9" as the error and that's about it.

Any pointers would be much appreciated.

Vash
  • 190
  • 8
  • Have you tried https://stackoverflow.com/questions/25962114/how-do-i-read-a-large-csv-file-with-pandas? – Paweł Kowalski May 12 '20 at 14:21
  • 1
    Can you include the output of `dmesg` around the time that you see things crash / get killed? I suspect [OOM](https://en.wikipedia.org/wiki/Out_of_memory) might be at play (the OOM killer will deliver SIGKILL / 9 to the target process) – Attie May 12 '20 at 14:21
  • 1
    The best way is to read the `CSV` in small chunks. Based on your memory size, you can decide each chunk's size and process it individually. – Mayank Porwal May 12 '20 at 14:22
  • in addition to the suggestion by @makr3la, u could consider dumping it into sqlite and reading the data from there. u could even do some preprocessing or indexing there before reading into pandas. – sammywemmy May 12 '20 at 14:24
  • Buy more RAM on your computer. See the [page cache](https://en.wikipedia.org/wiki/Page_cache) and http://linuxatemyram.com/ for explanations. Or read a [textbook on operating systems](http://pages.cs.wisc.edu/~remzi/OSTEP/) – Basile Starynkevitch May 12 '20 at 15:30

0 Answers0