0

I am trying to train my model and i have csv file and one gz file, which was generated earlier. I am getting this error as mentioned below and not sure what is wrong.

Traceback (most recent call last):
  File "Model.py", line 87, in <module>
    data = pd.concat([pd.read_csv(log)])
  File "/usr/local/lib/python3.6/site-packages/pandas/io/parsers.py", line 678, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "/usr/local/lib/python3.6/site-packages/pandas/io/parsers.py", line 440, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
  File "/usr/local/lib/python3.6/site-packages/pandas/io/parsers.py", line 787, in __init__
    self._make_engine(self.engine)
  File "/usr/local/lib/python3.6/site-packages/pandas/io/parsers.py", line 1014, in _make_engine
    self._engine = CParserWrapper(self.f, **self.options)
  File "/usr/local/lib/python3.6/site-packages/pandas/io/parsers.py", line 1708, in __init__
    self._reader = parsers.TextReader(src, **kwds)
  File "pandas/_libs/parsers.pyx", line 539, in pandas._libs.parsers.TextReader.__cinit__
  File "pandas/_libs/parsers.pyx", line 767, in pandas._libs.parsers.TextReader._get_header
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte

Mycode:

for foo in range(0,1):
    # Read dataframe
    #data = pd.concat([pd.read_csv(log.replace('0',str(idx),1)) for idx in range(5)])
    log = path + 'train_features/log_.csv'

    test_log = path + 'test_features/log_features.gz'
    data = pd.concat([pd.read_csv(log)])
KSp
  • 1,199
  • 1
  • 11
  • 29

1 Answers1

0

Try:

data = pd.read_csv(log, encoding = "utf-8")

Although I don't understand why you need the for loop or pd.concat

If you don't know your type of encoding, try: this:

import chardet

with open(log, 'rb') as f:
    result = chardet.detect(f.read())  # or readline if the file is large


data = pd.read_csv(log, encoding=result['encoding'])

source

Mohit Motwani
  • 4,662
  • 3
  • 17
  • 45