Insert a row in pd.DataFrame without loading the file

Question

The following code is effective to insert a row (features names) in my dataset as a first row:

features = ['VendorID', 'mta_tax', 'tip_amount', 'tolls_amount', 'improvement_surcharge', 'total_amount']

df = pd.DataFrame(pd.read_csv(path + 'data.csv', sep=','))
df.loc[-1] = features  # adding a row
df.index = df.index + 1  # shifting index
df = df.sort_index()  # sorting by index

But data.csv is very large ~ 10 GB, hence I am wondering if I can insert features row directly in the file without loading it! Is it possible?

Thank you

score 1 · Answer 1 · answered May 02 '18 at 17:40

1

You don't have to load the entire file into memory, use the stdlib csv module's writer functionality to append a row to the end of the file.

import csv
import os

with open(os.path.join(path, 'data.csv'), 'a') as f:
    writer = csv.writer(f)
    writer.writerow(features)

answered May 02 '18 at 17:40

cs95

379,657
97
704
746

@Ste. That is extremely difficult to do without moving all the existing 10GB of data around. – cs95 May 02 '18 at 17:43
I see. Thank you so much for your post – steve May 02 '18 at 17:44
1

@Ste. Here is [another post](https://stackoverflow.com/questions/5914627/prepend-line-to-beginning-of-a-file) on the topic, and almost all the answers involve moving data around. Good luck! – cs95 May 02 '18 at 17:45

Insert a row in pd.DataFrame without loading the file

1 Answers1