2

I have a process that writes out a dataframe into a pickle file using the standard protocol df.to_pickle:

import pandas as pd

#sample example data
data = {'col1': [1, 2, 3, 4, 5],
        'col2' : ['a', 'b', 'c', 'd', 'e']}

#create dataframe
df = pd.DataFrame(data)
print(df)

   col1 col2
0     1    a
1     2    b
2     3    c
3     4    d
4     5    e

#write out dataframe to a pickle
df.to_pickle('myPickleFile.p')

now i have a second (separate) process that needs to read and process that file in chunks (for memory purposes given my data is extremely large), usually if this was say a txt file or a HDF file, i'd usually do something similar to the below:

for chunk in pd.read_csv('myCSVFile.csv', chunksize = 1000000):
    #do stuff example:
    print(len(chunk))

The key reason i'm keen to keep the file in pickle format is due to the read/write speeds compared to a txt or HDF files, in my case, it's more than 300% quicker.

It seems that I can't do that with read_pickle as it doesn't support reading in chunks.

So my question is: is there a way of reading a pickle file in chunks into pandas? if yes, please point me in the right direction.

Thanks.

Mit
  • 679
  • 6
  • 17
  • 1
    you can't read a pkl file in chunks https://stackoverflow.com/questions/59983073/how-to-load-pickle-file-in-chunks – bbd108 Feb 11 '21 at 04:02
  • @bbd108 thanks. I think i’ve come across that question while looking for a way. Then kabanus’s comment suggested there may be a way depending on the data and how it’s being pickled hence i asked my question. But yeah so far i think you may be right, i just couldn’t find a way of doing it. Thanks again. – Mit Feb 11 '21 at 11:11
  • @kabanus i couldn’t mention two users in one comment. – Mit Feb 11 '21 at 11:12

0 Answers0