I have a huge compressed file on which I am interested in reading the individual dataframes, so as not to run out of memory.
Also, due to time and space, I can't unzip the .tar.gz.
This is the code I've got this far:
import pandas as pd
# With this lib we can navigate on a compressed files
# without even extracting its content
import tarfile
import io
tar_file = tarfile.open(r'\\path\to\the\tar\file.tar.gz')
# With the following code we can iterate over the csv contained in the compressed file
def generate_individual_df(tar_file):
return \
(
(
member.name, \
pd.read_csv(io.StringIO(tar_file.extractfile(member).read().decode('ascii')), header=None)
)
for member in tar_file
if member.isreg()\
)
for filename, dataframe in generate_individual_df(tar_file):
# But dataframe is the whole file, which is too big
Tried the How to create Panda Dataframe from csv that is compressed in tar.gz? but still can't solve ...