How to read a csv file with huge data from a .7z archive?

Question

I have a csv file containing 162 gb of data, that I had to compress using 7-zip to save space. I have been using libarchive to read from .7z files and append the blocks read to get the final result at the end. But the problem with this file is, its so huge that I cannot append it to a create a single string or dataframe since my main memory is limited to 8 gb. Furthermore, I cannot perform any operation on each block since the blocks read are incosistent each time the last line clips off some of the columns.

Following is the snippet that I am using to read the csv file:

import libarchive

with libarchive.file_reader(r'D:\Features\four_grams.7z') as e:
    for entry in e:
        for b in entry.get_blocks():
            print(b.decode('utf-8'))

Following is the pastebin of the a single block of output:

https://pastebin.com/7agwAAds

Notice the clipping of the final row.

I would appreciate any help with reading complete and chunks of rows from a huge csv file that is archived.

How to read a csv file with huge data from a .7z archive?

0 Answers0