How to find the estimated size of dataset prior to writing it to disk

Asked Jul 13 '22 at 11:34

Active Jul 13 '22 at 11:34

Viewed 106 times

I have a large dataset (csv file ) in my memory. I am reading dataset in my python environment. I have to split the dataset when it reaches 1 GB. So, my initial dataset is around 1.8 GB when I read. So, accordingly, I should have two datasets, one with 1 GB of size and the other with remaining.

How to do it ?

The solution should considered time and space complexity both

asked Jul 13 '22 at 11:34

Payal Bhatia

Does this answer your question? [How to estimate how much memory a Pandas' DataFrame will need?](https://stackoverflow.com/questions/18089667/how-to-estimate-how-much-memory-a-pandas-dataframe-will-need) – endive1783 Jul 13 '22 at 11:40
yes , to a certain extent only. I found one strange thing. When I am reading it using memory_usage() function (sum(data.memory_usage(deep=True))) and adding all the numbers , it is leading to around 4.9 GB. However , the actual file is just of 1.3 GB ? – Payal Bhatia Jul 13 '22 at 13:12

How to find the estimated size of dataset prior to writing it to disk

0 Answers0