0

When I am trying to read a csv file I am getting this type of error:

Traceback (most recent call last):
  File "/root/Downloads/csvafa.py", line 4, in <module>
    for i in a:
  File "/usr/lib/python3.8/codecs.py", line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte

The code that i used:

import csv
with open('Book1.csv') as f:
    a=csv.reader(f)
    for i in a:
        print(i)

i even tried to change the encoding to latin1:

import csv
with open('Book1.csv',encoding='latin1') as f:
    a=csv.reader(f)
    for i in a:
        print(i)

After that i am getting this type of error message:

Traceback (most recent call last):
  File "/root/Downloads/csvafa.py", line 4, in <module>
    for i in a:
_csv.Error: line contains NUL

I am a beginner to python

James Z
  • 12,209
  • 10
  • 24
  • 44

2 Answers2

1

This error is raised when we try to encode an invalid string. When Unicode string can’t be represented in this encoding (UTF-8), python raises a UnicodeEncodeError. You can try encoding: 'latin-1' or 'iso-8859-1'.

import pandas as pd
dataset = pd.read_csv('Book1.csv', encoding='ISO-8859–1')

It can also be that the data is compressed. Have a look at this answer.

Dharman
  • 30,962
  • 25
  • 85
  • 135
Nutan
  • 320
  • 2
  • 7
0

I would try reading the file in utf-8 enconding

another solution might be this answer

It's still most likely gzipped data. gzip's magic number is 0x1f 0x8b, which is consistent with the UnicodeDecodeError you get.

You could try decompressing the data on the fly:

  with open('destinations.csv', 'rb') as fd:
       gzip_fd = gzip.GzipFile(fileobj=fd)
      destinations = pd.read_csv(gzip_fd)
kaki gadol
  • 1,116
  • 1
  • 14
  • 34