pandas with heavy Datasets

Asked Jun 26 '18 at 20:01

Active Jun 26 '18 at 20:01

Viewed 86 times

Currently I am having a CSV file of 38 GB of containing detail order of enzymes and protein. Since i am having problem in loading this onto pandas as my computer RAM is limited to 16 Gigs, is there is any way of handling such heavy data into pandas?

I have done some research and found that Spark can handle large data sets, any comments on this will be highly appreciate.

asked Jun 26 '18 at 20:01

Kaustav Sengupta

3

First of all, look into dask: https://github.com/dask/dask . Second of all, Stack Overflow is not the place to ask these types of questions. – TayTay Jun 26 '18 at 20:03
Thanks I will definitely going to look into that. Great help – Kaustav Sengupta Jun 26 '18 at 20:04
There are various proposed solutions in that above link, including dask, reading in columns as categories and as a last restort working with the file in chunks. – ALollz Jun 26 '18 at 20:10
Tgsmith61591 already answered my question. Dask is much more appropriate for this kind of work. Anyway thanks guys – Kaustav Sengupta Jun 26 '18 at 20:13

pandas with heavy Datasets

0 Answers0