0

I have a csv file that contains multiple languages (Russian, Thai, Chinese). When I tried to read my csv file with encoding='utf-8'gave me an error (as stated in the title).

Below is my read csv code:

df = pd.read_csv(r'/Users/syafiq/Desktop/a.csv', encoding='utf-8')

Below is the traceback call:

Traceback (most recent call last):

  File "<ipython-input-6-60598a28d158>", line 1, in <module>
    runfile('/Users/syafiq/Desktop/Pandas/beep3.py', wdir='/Users/syafiq/Desktop/Pandas')

  File "/Users/syafiq/opt/anaconda3/lib/python3.7/site-packages/spyder_kernels/customize/spydercustomize.py", line 827, in runfile
    execfile(filename, namespace)

  File "/Users/syafiq/opt/anaconda3/lib/python3.7/site-packages/spyder_kernels/customize/spydercustomize.py", line 110, in execfile
    exec(compile(f.read(), filename, 'exec'), namespace)

  File "/Users/syafiq/Desktop/Pandas/beep3.py", line 25, in <module>
    main()

  File "/Users/syafiq/Desktop/Pandas/beep3.py", line 16, in main
    df = pd.read_csv(r'/Users/syafiq/Desktop/a.csv', encoding='utf-8')

  File "/Users/syafiq/opt/anaconda3/lib/python3.7/site-packages/pandas/io/parsers.py", line 685, in parser_f
    return _read(filepath_or_buffer, kwds)

  File "/Users/syafiq/opt/anaconda3/lib/python3.7/site-packages/pandas/io/parsers.py", line 457, in _read
    parser = TextFileReader(fp_or_buf, **kwds)

  File "/Users/syafiq/opt/anaconda3/lib/python3.7/site-packages/pandas/io/parsers.py", line 895, in __init__
    self._make_engine(self.engine)

  File "/Users/syafiq/opt/anaconda3/lib/python3.7/site-packages/pandas/io/parsers.py", line 1135, in _make_engine
    self._engine = CParserWrapper(self.f, **self.options)

  File "/Users/syafiq/opt/anaconda3/lib/python3.7/site-packages/pandas/io/parsers.py", line 1917, in __init__
    self._reader = parsers.TextReader(src, **kwds)

  File "pandas/_libs/parsers.pyx", line 542, in pandas._libs.parsers.TextReader.__cinit__

  File "pandas/_libs/parsers.pyx", line 764, in pandas._libs.parsers.TextReader._get_header

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x95 in position 0: invalid start byte

Thank you in advance!

hula-hula
  • 119
  • 1
  • 11
  • 1
    Did you try changing `encoding='utf-8'` to `encoding='latin-1'`? – ManojK Apr 05 '20 at 16:24
  • You probably want to check the seperator of your csv, it's probably not a `,` which is the default for `pd.read_csv`. – Erfan Apr 05 '20 at 16:26

0 Answers0