Strip the First Four Rows of a CSV in Python?

Question

I have a batch of 50-60 csv files which, for whatever reason, have total junk data for the first four rows of each file. After the junk data, however, the column headers are properly listed, and the rest of the file is fine. How could I go about stripping each file of these first four files in python? Here is my code thus far:

import csv
total = open('C:\\Csv\\201.csv', 'rb')
for row in csv.reader(total):
    print row

As you can see, all I have done is opened the file and printed its contents. I have searched around for solutions of deleting certain aspects of csv files, but most either delete entire columns, or hinge on a particular condition for the row to be deleted. In my case, it is simply a matter of order, and every file needs to be stripped of its first four rows. Any and all help is greatly appreciated.

Joel Cornett · Answer 1 · 2013-02-23T22:54:56.793

8

You could do:

reader = csv.reader(total)
all(next(reader) for i in range(4))

or

for i in range(4): next(reader)

edited Feb 23 '13 at 22:54

answered Feb 23 '13 at 19:50

Joel Cornett

24,192
9
66
88

That's creative. I never would have thought to use "any". – user1067257 Feb 23 '13 at 19:56
3

`any` only skip the first line, `all` skips all four lines. – Hai Vu Feb 23 '13 at 20:52

score 3 · Accepted Answer · answered Feb 23 '13 at 19:42

3

for i, line in enumerate(sys.stdin, -4):
    if i>=0: print line,

answered Feb 23 '13 at 19:42

newtover

31,286
11
84
89

score 1 · Answer 3 · answered Feb 23 '13 at 20:46

You can write a generic function to skip the first n items of any sequence:

def skip_first(seq, n):
    for i,item in enumerate(seq):
        if i >= n:
            yield item

To use it:

import csv
with open('C:\\Csv\\201.csv', 'rb') as total:
    csvreader = csv.reader(total)
    for row in skip_first(csvreader, 4):
        print row

This function is generic because it can skip over any sequence, not just file:

# Skip the first three
list = ['happy', 'grumpy', 'doc', 'sleepy', 'bashful', 'sneezy', 'dopey']
for item in skip_first(list, 3):
    print item

score 0 · Answer 4 · answered Feb 23 '13 at 22:12

I'm surprised no one has suggested the Pythonic way of using islice here...

from itertools import islice
with open('somefile') as fin:
    csvin = islice(csv.reader(fin), 4, None, None)
    for row in csvin:
        pass

example:

>>> r = range(10); list(islice(r, 4, None, None))
[4, 5, 6, 7, 8, 9]

score 0 · Answer 5 · answered Feb 19 '16 at 21:52

None of the answers seem to be taking the header line required for DictReader into account: unless the first line contains anything else than the list of fields, DictReader won't recognize them and parse properly.

And because csv.reader expects file-like object, I had to use StringIO as a temporary buffer (not a serious issue, I have about 20 rows there usually).

with StringIO() as csvio:
    for i, line in enumerate(myfile.iter_lines()):
        if i < 5:
            continue
        else:
            csvio.write(line)

    reader = csv.DictReader(csvio)

Would appreciate better suggestions how to create file-like objects for all the lines except first N without buffering if all in memory.

score 0 · Answer 6 · edited Jun 01 '20 at 22:43

0

I surprised no one mentioned the parameter available to skiprows while calling the read function.

df = pd.read_csv('somefile.csv',skiprows=4)

You can check the file for rows containing the header and give value to **skiprows** as per it removes the first k rows if the value is k.

edited Jun 01 '20 at 22:43

Aman Srivastava

1,007
1
13
25

answered Jun 01 '20 at 17:21

Dras227

1
1

score 0 · Answer 7 · answered Feb 02 '21 at 13:36

0

This is what I would do to skip the first four rows in the file

df = pd.read_csv("C:/Users//...",skiprows=4)

answered Feb 02 '21 at 13:36

Stan

3
2

Strip the First Four Rows of a CSV in Python?

7 Answers7