-1

I need to skip some first lines from a CSV file and save it to another file.

The code i currently accomplish such tasks is:

import pandas as pd
df = pd.read_csv('users.csv', skiprows=2)    
df.to_csv("usersOutput.csv", index=False)

and it works without issues. The only thing is: this code reads the whole file before saving. Now my problem is: i have to deal with a file with 4GB size and i think, this code will be very time consuming.

Is there a possibility to skip some first lines and save the file without to read it before?

Evgeniy
  • 2,337
  • 2
  • 28
  • 68

2 Answers2

3

You don't need to use pandas just to filter lines from a file:

with open('users.csv') as users, open('usersOutput.csv', 'w') as output:
    for lineno, line in enumerate(users):
        if lineno > 1:
            output.write(line)
Peter Wood
  • 23,859
  • 5
  • 60
  • 99
2

The most efficient way with shutil.copyfileobj(fsrc, fdst[, length]) feature:

from shutil import copyfileobj
from itertools import islice

with open('users.csv') as f_old, open('usersOutput.csv', 'w') as f_new:
    list(islice(f, 2))   # skip 2 lines
    copyfileobj(f_old, f_new)

From doc:

... if the current file position of the fsrc object is not 0, only the contents from the current file position to the end of the file will be copied.

i.e. the new file will contain the same content except the first 2 lines.

RomanPerekhrest
  • 88,541
  • 4
  • 65
  • 105