2

I have a batch of 50-60 csv files which, for whatever reason, have total junk data for the first four rows of each file. After the junk data, however, the column headers are properly listed, and the rest of the file is fine. How could I go about stripping each file of these first four files in python? Here is my code thus far:

import csv
total = open('C:\\Csv\\201.csv', 'rb')
for row in csv.reader(total):
    print row

As you can see, all I have done is opened the file and printed its contents. I have searched around for solutions of deleting certain aspects of csv files, but most either delete entire columns, or hinge on a particular condition for the row to be deleted. In my case, it is simply a matter of order, and every file needs to be stripped of its first four rows. Any and all help is greatly appreciated.

Unihedron
  • 10,902
  • 13
  • 62
  • 72
user1067257
  • 443
  • 3
  • 6
  • 15

7 Answers7

8

You could do:

reader = csv.reader(total)
all(next(reader) for i in range(4))

or

for i in range(4): next(reader)
Joel Cornett
  • 24,192
  • 9
  • 66
  • 88
3
for i, line in enumerate(sys.stdin, -4):
    if i>=0: print line,
newtover
  • 31,286
  • 11
  • 84
  • 89
1

You can write a generic function to skip the first n items of any sequence:

def skip_first(seq, n):
    for i,item in enumerate(seq):
        if i >= n:
            yield item

To use it:

import csv
with open('C:\\Csv\\201.csv', 'rb') as total:
    csvreader = csv.reader(total)
    for row in skip_first(csvreader, 4):
        print row

This function is generic because it can skip over any sequence, not just file:

# Skip the first three
list = ['happy', 'grumpy', 'doc', 'sleepy', 'bashful', 'sneezy', 'dopey']
for item in skip_first(list, 3):
    print item
Hai Vu
  • 37,849
  • 11
  • 66
  • 93
0

I'm surprised no one has suggested the Pythonic way of using islice here...

from itertools import islice
with open('somefile') as fin:
    csvin = islice(csv.reader(fin), 4, None, None)
    for row in csvin:
        pass

example:

>>> r = range(10); list(islice(r, 4, None, None))
[4, 5, 6, 7, 8, 9]
Jon Clements
  • 138,671
  • 33
  • 247
  • 280
0

None of the answers seem to be taking the header line required for DictReader into account: unless the first line contains anything else than the list of fields, DictReader won't recognize them and parse properly.

And because csv.reader expects file-like object, I had to use StringIO as a temporary buffer (not a serious issue, I have about 20 rows there usually).

with StringIO() as csvio:
    for i, line in enumerate(myfile.iter_lines()):
        if i < 5:
            continue
        else:
            csvio.write(line)

    reader = csv.DictReader(csvio)

Would appreciate better suggestions how to create file-like objects for all the lines except first N without buffering if all in memory.

Ivan Anishchuk
  • 487
  • 3
  • 16
0

I surprised no one mentioned the parameter available to skiprows while calling the read function.

df = pd.read_csv('somefile.csv',skiprows=4)

You can check the file for rows containing the header and give value to **skiprows** as per it removes the first k rows if the value is k.

Aman Srivastava
  • 1,007
  • 1
  • 13
  • 25
Dras227
  • 1
  • 1
0

This is what I would do to skip the first four rows in the file

df = pd.read_csv("C:/Users//...",skiprows=4)
Stan
  • 3
  • 2