Python continue for loop after exception

Question

I'm trying to create a new version of a file that excludes NULL bytes. I'm using the code below to attempt this however it's still breaking on the NULL byte. How should I structure the for statement and try-catch block to keep going after the exception?

import csv

input_file = "/data/train.txt"
outFileName = "/data/train_no_null.txt"
############################

i_f = open( input_file, 'r' )
reader = csv.reader( i_f , delimiter = '|' )

outFile = open(outFileName, 'wb') 
mywriter = csv.writer(outFile, delimiter = '|')

i_f.seek( 0 )
i = 1

for line in reader:
    try:
        i += 1
        mywriter.writerow(line)

    except csv.Error:
        print('csv choked on line %s' % (i + 1))
        pass

EDIT:

Here's the error message:

Traceback (most recent call last):
  File "20150310_rewrite_csv_wo_NULL.py", line 26, in <module>
    for line in reader:
_csv.Error: line contains NULL byte

UPDATE:

I'm using this code:

i_f = open( input_file, 'r' )
reader = csv.reader( i_f , delimiter = '|' )
# reader.next()

outFile = open(outFileName, 'wb') 
mywriter = csv.writer(outFile, delimiter = '|')

i_f.seek( 0 )
i = 1


for idx, line in enumerate(reader):
    try:
        mywriter.writerow(line)
    except:
        print('csv choked on line %s' % idx)

and now get this error:

Traceback (most recent call last):
  File "20150310_rewrite_csv_wo_NULL.py", line 26, in <module>
    for idx, line in enumerate(reader):
_csv.Error: line contains NULL byte

What error do you get? If not an `csv.Error`, then you simply need more except clause(s) to handle them. — mdurant, Mar 10 '15 at 20:12
You can see the "_" character... so it's an exceptions defined in a module somewhere within csv. — mdurant, Mar 10 '15 at 20:20
Just wondering if you've seen this question: http://stackoverflow.com/questions/4166070/python-csv-error-line-contains-null-byte ? — Captain Whippet, Mar 10 '15 at 21:59
@CaptainWhippet: I have, the one other caveat is that the file I'm working with is 20GB so I can't read into memory before rewriting. — screechOwl, Mar 10 '15 at 22:02

score 0 · Answer 1 · answered Mar 10 '15 at 20:16

0

You can catch all errors with the following code...

for idx, line in enumerate(reader):
    try:
        mywriter.writerow(line)
    except:
        print('csv choked on line %s' % idx)

answered Mar 10 '15 at 20:16

Alex

18,484
8
60
80

score 0 · Answer 2 · answered Mar 10 '15 at 22:19

0

The exception is being thrown from the reader, which is not being caught as it is outside of the try/catch.

But even if it was, the reader won't want to continue after its encounter with the NUL byte. But if the reader never saw it, along the lines of...

for idx, line in enumerate(csv.reader((line.replace('\0','') for line in open('myfile.csv')), delimiter='|')):

you might be OK.

Really though, you should find out where the NUL bytes are coming from as they might be symptomatic of a wider problem with your data.

answered Mar 10 '15 at 22:19

langton

126
1
3

the data is coming out of a redshift database and generally hanging out in S3. Any idea if it's a function of that environment or it's in the data before going into redshift? – screechOwl Mar 10 '15 at 22:33
I'm not sure how you've ended up with those NUL bytes in the data then, getting them into Redshift would be difficult: _If your data includes a null terminator, also referred to as NUL (UTF-8 0000) or binary zero (0x000), COPY treats it as an end of record (EOR) and terminates the record._ [http://docs.aws.amazon.com/redshift/latest/dg/r_COPY.html]. And if you can't get them in there, I don't know how you got them back out! – langton Mar 10 '15 at 22:44

Python continue for loop after exception

2 Answers2