2

I have tried reading a csv file using the following format of code.

def csv_dict_reader(file_obj):
    reader = csv.reader(file_obj, delimiter=',')
    for row in reader:
        # some operation

file = open("data.csv", "r")
csv_dict_reader(file)

I have referred the solutions given here, but none of them seem to work. What could be the most probable reason for this.

The error:

    for row in reader:
_csv.Error: line contains NULL byte
yobro97
  • 1,125
  • 8
  • 23
  • The most probable reason is the line contains a NULL (zero value) byte. Without a sample data file, can't say anything more. – Mark Tolonen Nov 12 '17 at 07:32
  • Refer [this](https://stackoverflow.com/questions/4166070/python-csv-error-line-contains-null-byte). Might be useful! – Anuj Nov 12 '17 at 07:58
  • @MarkTolonen The data file read was successful when i was reading the first `10,000` samples. When i tried reading `20,000` samples, it throws this error after reading the `16,843rd` sample. – yobro97 Nov 12 '17 at 08:07
  • So the zero byte is in the 16,843rd line. Text files don't normally have zero bytes so your file could have been corrupted or just written incorrectly. – Mark Tolonen Nov 12 '17 at 14:35

1 Answers1

1

The file contains one or more NULL bytes which is not compatible with the CSV reader. As a workaround, you could read the file a line at time and if a NULL byte is detected, replace it will a space character. The resulting line could then by parsed by a CSV reader by converting the resulting string into a file like object. Note, the delimiter by default is , so it does not need to be specified. By adding enumerate(), you could then display which lines in your file contain the NULL bytes.

As you are using a DictReader(), an extra step is needed to first extract the header from your file using a normal csv.reader(). This row can be used to manually specify the fieldnames parameter to your DictReader.

import csv
import StringIO

with open('data.csv', 'rb') as f_input:
    # Use a normal CSV reader to get the header line
    header = next(csv.reader(f_input))

    for line_number, raw_line in enumerate(f_input, start=1):
        if '\x00' in raw_line:
            print "Line {} - NULL found".format(line_number)
            raw_line = raw_line.replace('\x00', ' ')

        row = next(csv.DictReader(StringIO.StringIO(raw_line), fieldnames=header))
        print row

Lastly, when using a csv.reader(), you should open the file in binary mode e.g. rb.

Martin Evans
  • 45,791
  • 17
  • 81
  • 97