Unicode DecodeError while concatenating

Question

I am trying to train my model and i have csv file and one gz file, which was generated earlier. I am getting this error as mentioned below and not sure what is wrong.

Traceback (most recent call last):
  File "Model.py", line 87, in <module>
    data = pd.concat([pd.read_csv(log)])
  File "/usr/local/lib/python3.6/site-packages/pandas/io/parsers.py", line 678, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "/usr/local/lib/python3.6/site-packages/pandas/io/parsers.py", line 440, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
  File "/usr/local/lib/python3.6/site-packages/pandas/io/parsers.py", line 787, in __init__
    self._make_engine(self.engine)
  File "/usr/local/lib/python3.6/site-packages/pandas/io/parsers.py", line 1014, in _make_engine
    self._engine = CParserWrapper(self.f, **self.options)
  File "/usr/local/lib/python3.6/site-packages/pandas/io/parsers.py", line 1708, in __init__
    self._reader = parsers.TextReader(src, **kwds)
  File "pandas/_libs/parsers.pyx", line 539, in pandas._libs.parsers.TextReader.__cinit__
  File "pandas/_libs/parsers.pyx", line 767, in pandas._libs.parsers.TextReader._get_header
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte

Mycode:

for foo in range(0,1):
    # Read dataframe
    #data = pd.concat([pd.read_csv(log.replace('0',str(idx),1)) for idx in range(5)])
    log = path + 'train_features/log_.csv'

    test_log = path + 'test_features/log_features.gz'
    data = pd.concat([pd.read_csv(log)])

Mohit Motwani · Answer 1 · 2018-11-28T12:35:09.983

0

Try:

data = pd.read_csv(log, encoding = "utf-8")

Although I don't understand why you need the for loop or pd.concat

If you don't know your type of encoding, try: this:

import chardet

with open(log, 'rb') as f:
    result = chardet.detect(f.read())  # or readline if the file is large


data = pd.read_csv(log, encoding=result['encoding'])

source

edited Nov 28 '18 at 12:35

answered Nov 28 '18 at 12:13

Mohit Motwani

4,662
3
17
45

Thanks, But after doing this still the error persists – KSp Nov 28 '18 at 12:26
Try `encoding = 'ISO-8859-1'` or `encoding=’cp1252′`. It really depends on the encoding you have. – Mohit Motwani Nov 28 '18 at 12:29

Unicode DecodeError while concatenating

1 Answers1