I'm new to pandas and I would like to use your help.
I have two files, one of them is really big (100G+), which I need to merge based on some columns. I skip some lines in the big file, thus I get the file as buffer to the read_csv method.
Firsy, I tried to use pandas. However, when I tried to open the file using pandas, the process was killed by the OS.
with open(self.all_file, 'r') as f:
line = f.readline()
while line.startswith('##'):
pos = f.tell()
line = f.readline()
f.seek(pos)
return pd.read_csv(f,sep='\t')
Afterwards, I tried to use dask instead of pandas, however dask can't get a buffer as input for read_csv method and it fails.
return dd.read_csv(f,sep='\t')
How can I open large file as buffer and merge the two dataframes?
Thank you!