You need to use the chunksize
property of pandas. See for example How to read a 6 GB csv file with pandas.
You will process N rows at one time, without loading the whole dataframe. N
will depend on your computer: if N is low, it will cost less memory but it will increase the run time and will cost more IO load.
# create an object reading your file 100 rows at a time
reader = pd.read_csv( 'bigfile.tsv', sep='\t', header=None, chunksize=100 )
# process each chunk at a time
for chunk in file:
result = chunk.melt()
# export the results into a new file
result.to_csv( 'bigfile_melted.tsv', header=None, sep='\t', mode='a' )
Furthermore, you can use the argument dtype=np.int32
for read_csv
if you have integer or dtype=np.float32
to process data faster if you do not need precision.
NB: here you have examples of memory usage: Using Chunksize in Pandas.