Skip first rows from CSV with Python without reading the file

Question

I need to skip some first lines from a CSV file and save it to another file.

The code i currently accomplish such tasks is:

import pandas as pd
df = pd.read_csv('users.csv', skiprows=2)    
df.to_csv("usersOutput.csv", index=False)

and it works without issues. The only thing is: this code reads the whole file before saving. Now my problem is: i have to deal with a file with 4GB size and i think, this code will be very time consuming.

Is there a possibility to skip some first lines and save the file without to read it before?

score 3 · Accepted Answer · answered Nov 26 '19 at 15:14

3

You don't need to use pandas just to filter lines from a file:

with open('users.csv') as users, open('usersOutput.csv', 'w') as output:
    for lineno, line in enumerate(users):
        if lineno > 1:
            output.write(line)

answered Nov 26 '19 at 15:14

Peter Wood

23,859
5
60
99

RomanPerekhrest · Answer 2 · 2019-11-28T18:59:50.603

2

The most efficient way with shutil.copyfileobj(fsrc, fdst[, length]) feature:

from shutil import copyfileobj
from itertools import islice

with open('users.csv') as f_old, open('usersOutput.csv', 'w') as f_new:
    list(islice(f, 2))   # skip 2 lines
    copyfileobj(f_old, f_new)

From doc:

... if the current file position of the fsrc object is not 0, only the contents from the current file position to the end of the file will be copied.

i.e. the new file will contain the same content except the first 2 lines.

edited Nov 28 '19 at 18:59

answered Nov 26 '19 at 15:24

RomanPerekhrest

88,541
4
65
105

this solution is fast too, but not handy - one should add as much times `next(f_old)`, as much lines one wants to skip. – Evgeniy Nov 26 '19 at 15:36
1

@Evgeniy, and that's absolutely wrong conclusion. See my update. (with `f_old.readlines()` you can skip as much lines as needed). I've used `next` because your case was quite simple – RomanPerekhrest Nov 26 '19 at 15:43
2

This is a great answer, very efficient. – Peter Wood Nov 26 '19 at 19:35
2

@PeterWood, Thanks, I'm glad that there's people that realize what way is really faster and efficient. – RomanPerekhrest Nov 26 '19 at 19:37

Skip first rows from CSV with Python without reading the file

2 Answers2