1

The following code is effective to insert a row (features names) in my dataset as a first row:

features = ['VendorID', 'mta_tax', 'tip_amount', 'tolls_amount', 'improvement_surcharge', 'total_amount']

df = pd.DataFrame(pd.read_csv(path + 'data.csv', sep=','))
df.loc[-1] = features  # adding a row
df.index = df.index + 1  # shifting index
df = df.sort_index()  # sorting by index

But data.csv is very large ~ 10 GB, hence I am wondering if I can insert features row directly in the file without loading it! Is it possible?

Thank you

steve
  • 153
  • 1
  • 2
  • 9

1 Answers1

1

You don't have to load the entire file into memory, use the stdlib csv module's writer functionality to append a row to the end of the file.

import csv
import os

with open(os.path.join(path, 'data.csv'), 'a') as f:
    writer = csv.writer(f)
    writer.writerow(features)
cs95
  • 379,657
  • 97
  • 704
  • 746
  • @Ste. That is extremely difficult to do without moving all the existing 10GB of data around. – cs95 May 02 '18 at 17:43
  • I see. Thank you so much for your post – steve May 02 '18 at 17:44
  • 1
    @Ste. Here is [another post](https://stackoverflow.com/questions/5914627/prepend-line-to-beginning-of-a-file) on the topic, and almost all the answers involve moving data around. Good luck! – cs95 May 02 '18 at 17:45