0

Currently I am having a CSV file of 38 GB of containing detail order of enzymes and protein. Since i am having problem in loading this onto pandas as my computer RAM is limited to 16 Gigs, is there is any way of handling such heavy data into pandas?

I have done some research and found that Spark can handle large data sets, any comments on this will be highly appreciate.

Kaustav Sengupta
  • 95
  • 1
  • 1
  • 8
  • 3
    First of all, look into dask: https://github.com/dask/dask . Second of all, Stack Overflow is not the place to ask these types of questions. – TayTay Jun 26 '18 at 20:03
  • Thanks I will definitely going to look into that. Great help – Kaustav Sengupta Jun 26 '18 at 20:04
  • There are various proposed solutions in that above link, including dask, reading in columns as categories and as a last restort working with the file in chunks. – ALollz Jun 26 '18 at 20:10
  • Tgsmith61591 already answered my question. Dask is much more appropriate for this kind of work. Anyway thanks guys – Kaustav Sengupta Jun 26 '18 at 20:13

0 Answers0