3

I am trying to read each line of a csv file and get a "line contains NULL byte" error.

reader = csv.reader(open(mycsv, 'rU'))
for line in reader:
     print(line)


Traceback (most recent call last):
  File "<stdin>", line 1, in <module
_csv.Error: line contains NULL byte

Using the below I found that I have null bytes.

if '\0' in open(mycsv).read():
     print("have null byte")

What's the best way to work around this? Do a replace '\0' on all lines? I need to process this kind of file daily and have about 400,000 lines (1Gb) of data. I assume a replace would substantially slow this down even more.

Eric
  • 295
  • 2
  • 7
  • 22

1 Answers1

11

Try this!

import csv 

def mycsv_reader(csv_reader): 
  while True: 
    try: 
      yield next(csv_reader) 
    except csv.Error: 
      # error handling what you want.
      pass
    continue 
  return

if __name__ == '__main__': 
    reader = mycsv_reader(csv.reader(open(mycsv, 'rU')))
    for line in reader:
        print(line)
hjpotter92
  • 78,589
  • 36
  • 144
  • 183
han058
  • 908
  • 8
  • 19
  • That works and gets me through the file, I am just wondering why I am getting these null bytes. Are they maybe used instead of commas as separators? f.count('\x00') returns 1926 of these. – Eric Sep 26 '14 at 04:04
  • Please refer to the http://stackoverflow.com/questions/7894856/line-contains-null-byte-in-csv-reader-python – han058 Sep 26 '14 at 04:23