Python CSV reader to skip 9 headers

Question

import os
import csv

def get_file_path(filename):
    currentdirpath = os.getcwd()
    file_path = os.path.join(os.getcwd(), filename)
    print(file_path)
    return(file_path)

path = get_file_path('Invoice-Item.csv')

def read_csv(filepath):
    with open(filepath, 'r') as csvfile:
        reader = csv.reader(csvfile)
        for i in range(0, 9):            
            next(reader, None)        
        for row in reader:
            print(row[0])                   

read_csv(path)

I Looking for a technique to skip the 9 headers rather than the range function. Any help would be appreciated. Below is a sample of the csv file

Summary Journal Entry,JE-00000060
Journal Entry Date,28/02/2015
Accounting Period,Feb-15
Accounting Period Start,1/02/2015
Accounting Period End,28/02/2015
Included Transaction Types,Invoice Item
Included Time Period,01/02/2015-09/02/2015
Journal Run,JR-00000046
Segments,
,
Customer Account Number,Transaction Amount
210274174,545.45
210274174,909.09
210274174,909.09
210274174,909.09
210274174,909.09

Martijn Pieters · Answer 1 · 2016-07-18T21:00:19.573

2

You can use itertools.islice() to skip a fixed number of lines:

from itertools import islice

next(islice(reader, 9, 9), None)        
for row in reader:
    print(row[0])

The islice() object is instructed to skip 9 lines, then immediately stop without producing further results. It is itself an iterator, so you need to call next() on it still.

If you wanted to skip rows until the 'empty' row, that requires a different approach. You'd have to inspect each row and stop reading when you come across one that has only empty cells:

for row in reader:
    if not any(row):  # only empty cells or no cells at all
        break

for row in reader:
    print(row[0])

Demo of the latter approach:

>>> import csv
>>> import io
>>> sample = '''\
... Summary Journal Entry,JE-00000060
... Journal Entry Date,28/02/2015
... Accounting Period,Feb-15
... Accounting Period Start,1/02/2015
... Accounting Period End,28/02/2015
... Included Transaction Types,Invoice Item
... Included Time Period,01/02/2015-09/02/2015
... Journal Run,JR-00000046
... Segments,
... ,
... Customer Account Number,Transaction Amount
... 210274174,545.45
... 210274174,909.09
... 210274174,909.09
... 210274174,909.09
... 210274174,909.09
... '''
>>> with io.StringIO(sample) as csvfile:
...     reader = csv.reader(csvfile)
...     for row in reader:
...         if not [c for c in row if c]:
...             break
...     for row in reader:
...         print(row[0])                   
... 
Customer Account Number
210274174
210274174
210274174
210274174
210274174

Note that you want to leave newline handling to the csv.reader; when opening your file set newline='':

with open(filepath, 'r', newline='') as csvfile:

edited Jul 18 '16 at 21:00

answered Feb 19 '15 at 10:05

Martijn Pieters

1,048,767
296
4,058
3,343

is it workable using condition like WHILE since we only want to read part of the file. ie we want keep reading the line of the header until we reached the blank line – Ricard Le Feb 19 '15 at 10:32
@RicardLe you don't have a blank line; you have a row with one empty cell; the comma still counts. You didn't ask for an arbitrary count skip however, that is a *dudferent* problem. – Martijn Pieters Feb 19 '15 at 10:46
The suggestion above i have tried but it won't skip over the 9 headers. am i missing something in between. – Ricard Le Feb 19 '15 at 11:18
@RicardLe: ah, my mistake, you are using Python 3. Will correct as `filter(None, ...)` produces an iterable, not a list. – Martijn Pieters Feb 19 '15 at 11:20
it hasn't worked for some reasons. it wont skip over the headers. – Ricard Le Feb 19 '15 at 11:42
@RicardLe: sorry, I'm doing this in between things and the `not` got lost. I'll build a quick demo too to show it is now working correctly. – Martijn Pieters Feb 19 '15 at 11:51
really appreciated your responses. I am able to reproduce your results using the text assigned to a variable SAMPLE. But if i changed it to 'Invoice-Item.csv' none was printed. – Ricard Le Feb 19 '15 at 12:16

lib · Answer 2 · 2015-02-19T11:50:44.040

1

If you are using numpy, have a look at the skip_header argument in genfromtxt (http://docs.scipy.org/doc/numpy/user/basics.io.genfromtxt.html )

import numpy as np     
r = np.genfromtxt(filepath, skip_header=9, names = ['account','amount'] , delimiter = ',')
print(r.account[0],r.amount[0])

edited Feb 19 '15 at 11:50

answered Feb 19 '15 at 11:26

lib

2,918
3
27
53

score 1 · Answer 3 · answered Feb 19 '15 at 11:56

1

If you would consider using pandas, read_csv makes reading files very straightforward:

import pandas as pd

data = pd.read_csv(filename, skiprows=9)

answered Feb 19 '15 at 11:56

FuzzyDuck

1,492
12
14

Python CSV reader to skip 9 headers

3 Answers3

Linked