Have csv.reader tell when it is on the last line

Question

Apparently some csv output implementation somewhere truncates field separators from the right on the last row and only the last row in the file when the fields are null.

Example input csv, fields 'c' and 'd' are nullable:

a|b|c|d
1|2||
1|2|3|4
3|4||
2|3

In something like the script below, how can I tell whether I am on the last line so I know how to handle it appropriately?

import csv

reader = csv.reader(open('somefile.csv'), delimiter='|', quotechar=None)

header = reader.next()

for line_num, row in enumerate(reader):
    assert len(row) == len(header)
    ....

Note that what you posted is not CSV. CSV stands for "Comma Sepaerated Values" - the separator MUST be a comma. — , Jun 09 '09 at 16:02
most csv implementations allow you to specify an alternate character as the separator — ʞɔıu, Jun 09 '09 at 16:06
@Neil: huh? "The so-called CSV (Comma Separated Values) format is the most common import and export format for spreadsheets and databases. There is no “CSV standard” [..] Still, while the delimiters and quoting characters vary, the overall format is similar enough that.[..]" — SilentGhost, Jun 09 '09 at 16:08
@SilenTGhost CSV is special case of a more general data format called DSV (delimitter separated values). DSV is the format you are talking about. — , Jun 09 '09 at 16:11
In Python, per http://docs.python.org/library/csv.html, "the delimiters and quoting characters vary" -- in particular the delimiter is not necessarily a comma, so the csv module is also used to read "non-comma" separated values, ridiculous as the use may actually seem (just like, e.g., in Excel, "export to CSV" can use any delimiter, depending in that case on the Windows Regional Settings). — Alex Martelli, Jun 09 '09 at 16:13
If it makes anyone feel any better I think it's safe to say that the input I have to deal with here obeys no standard of any decency. — ʞɔıu, Jun 09 '09 at 16:14

score 16 · Accepted Answer · answered Jun 09 '09 at 16:03

16

Basically you only know you've run out after you've run out. So you could wrap the reader iterator, e.g. as follows:

def isLast(itr):
  old = itr.next()
  for new in itr:
    yield False, old
    old = new
  yield True, old

and change your code to:

for line_num, (is_last, row) in enumerate(isLast(reader)):
    if not is_last: assert len(row) == len(header)

etc.

answered Jun 09 '09 at 16:03

Alex Martelli

854,459
170
1,222
1,395

2

I'm surprised `itertools` doesn't have something like this. It's a use case that keeps recurring. – Mark Ransom Mar 05 '13 at 17:28
Note that you have to replace `itr.next()` by `next(itr)` if using Python 3. – viiv May 25 '21 at 14:19

score 6 · Answer 2 · answered May 15 '17 at 11:36

I am aware it is an old question, but I came up with a different answer than the ones presented. The reader object already increments the line_num attribute as you iterate through it. Then I get the total number of lines at first using row_count, then I compare it with the line_num.

import csv

def row_count(filename):
    with open(filename) as in_file:
        return sum(1 for _ in in_file)

in_filename = 'somefile.csv'
reader = csv.reader(open(in_filename), delimiter='|')

last_line_number = row_count(in_filename)
for row in reader:
    if last_line_number == reader.line_num:
        print "It is the last line: %s" % row

But you first loop through the entire file to get the number of total lines. Not very efficient. And what happens when the file changed between you opened it the first time and the second time? — Stefan, Apr 12 '21 at 15:12

score 3 · Answer 3 · answered Jul 30 '09 at 02:09

3

If you have an expectation of a fixed number of columns in each row, then you should be defensive against:

(1) ANY row being shorter -- e.g. a writer (SQL Server / Query Analyzer IIRC) may omit trailing NULLs at random; users may fiddle with the file using a text editor, including leaving blank lines.

(2) ANY row being longer -- e.g. commas not quoted properly.

You don't need any fancy tricks. Just an old-fashioned if-test in your row-reading loop:

for row in csv.reader(...):
    ncols = len(row)
    if ncols != expected_cols:
        appropriate_action()

answered Jul 30 '09 at 02:09

John Machin

81,303
11
141
189

I agree but the source of this data refuses/is too incompetent to send me correctly formatted data. I have no choice but to handle its quirks myself. – ʞɔıu Jul 30 '09 at 16:03
Yes you have to handle its quirks yourself and I'm just pointing out that more quirks than "missing trailing null fields in last row" should be checked for in generality AND they can be checked simply without fancy code -- I don't understand your "but". – John Machin Jul 30 '09 at 16:50

score 1 · Answer 4 · answered Apr 08 '14 at 15:08

if you want to get exactly the last row try this code:

with open("\\".join([myPath,files]), 'r') as f:
    print f.readlines()[-1] #or your own manipulations

If you want to continue working with values from row do the following:

f.readlines()[-1].split(",")[0] #this would let you get columns by their index

score 0 · Answer 5 · edited Oct 21 '12 at 07:23

0

If you use for row in reader:, it will just stop the loop after the last item has been read.

edited Oct 21 '12 at 07:23

iMom0

12,493
3
49
61

answered May 31 '12 at 12:36

Peter Rogers

19
2

score 0 · Answer 6 · edited May 23 '17 at 12:26

0

Could you not just catch the error when the csv reader reads the last line in a

try: ... do your stuff here... except: StopIteration

condition ?

See the following python code on stackoverflow for an example of how to use the try: catch: Python CSV DictReader/Writer issues

edited May 23 '17 at 12:26

Community

1
1

answered Jul 29 '09 at 21:15

Alex Boschmans

515
2
12

won't tell you when you're on the last line, only will tell you after you've passed the last line. – ʞɔıu Jul 30 '09 at 16:04
1

I reread your question again, and you're right, that's not what you are asking - you want a way to deal with the last line. Why can't you use the solution by John Machin supplied below ? – Alex Boschmans Jul 30 '09 at 18:45

score 0 · Answer 7 · answered Jun 09 '09 at 16:05

0

Just extend the row to the length of the header:

for line_num, row in enumerate(reader):
    while len(row) < len(header):
        row.append('')
    ...

answered Jun 09 '09 at 16:05

serbaut

5,852
2
28
32

Have csv.reader tell when it is on the last line

7 Answers7

Linked