-1

I'm trying to import a 500mb csv file using pandas. when I do this:

import pandas as pd

df = pd.read_csv ('filename.csv')
df.head()

the result was:

Traceback (most recent call last):
  File "/Users/Filename.py", line 3, in <module>
    df = pd.read_csv ('/Users/Filename.csv')
  File "/Users/venv/lib/python3.9/site-packages/pandas/util/_decorators.py", line 211, in wrapper
    return func(*args, **kwargs)
  File "/Users/venv/lib/python3.9/site-packages/pandas/util/_decorators.py", line 331, in wrapper
    return func(*args, **kwargs)
  File "/Users/venv/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 950, in read_csv
    return _read(filepath_or_buffer, kwds)
  File "/Usersvenv/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 605, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
  File "/Users/venv/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 1442, in __init__
    self._engine = self._make_engine(f, self.engine)
  File "/Users/venv/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 1753, in _make_engine
    return mapping[engine](f, **self.options)
  File "/Users/venv/lib/python3.9/site-packages/pandas/io/parsers/c_parser_wrapper.py", line 79, in __init__
    self._reader = parsers.TextReader(src, **kwds)
  File "pandas/_libs/parsers.pyx", line 547, in pandas._libs.parsers.TextReader.__cinit__
  File "pandas/_libs/parsers.pyx", line 636, in pandas._libs.parsers.TextReader._get_header
  File "pandas/_libs/parsers.pyx", line 852, in pandas._libs.parsers.TextReader._tokenize_rows
  File "pandas/_libs/parsers.pyx", line 1965, in pandas._libs.parsers.raise_parser_error
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa5 in position 4540: invalid start byte

Please your help will be very useful!

wjandrea
  • 28,235
  • 9
  • 60
  • 81
  • 1
    Welcome to Stack Overflow! Please take the [tour]. SO is a Q&A site, but this isn't a question, and it's not clear what you need help with. To start, do you understand what the error means? If you do, have you tried another encoding, and if so, what happened? Check out [ask]. – wjandrea Mar 02 '23 at 23:22
  • Your file isn't utf-8 encoded. Do you know the encoding? – tdelaney Mar 02 '23 at 23:23
  • 1
    `pd.read_csv('filename.csv', encoding='cp1252')` may do it. That would translate '0xa5' to '¥'. – tdelaney Mar 02 '23 at 23:26
  • 1
    Possible duplicate: [UnicodeDecodeError when reading CSV file in Pandas with Python](/q/18171739/4518341) – wjandrea Mar 02 '23 at 23:28

1 Answers1

1

It looks that there is one line that cannot decode, you can skip that line like this:

df = pd.read_csv ('filename.csv',on_bad_lines='skip')
Alessandro Togni
  • 680
  • 1
  • 9
  • 24
ilshatt
  • 29
  • 4
  • Thanks, I have a "solution", my file was a csv file, instead of ',' the separator is ';'. So using df = pd.read_csv('filename.csv', delimiter = ';' , encoding = 'utf-8', on_bad_lines = 'skip, low_memory = False) had better solution. Not all the rows and columns was showed, and some characters appears as Nan. – colotech322 Mar 03 '23 at 15:56
  • Nice! I didn´t know that some csv had ';' withing lines, I will have your solution in mind. Thanks! – ilshatt Mar 04 '23 at 15:50