0

I am trying to read a csv file using the following code:

with open('test.csv') as f:
    encoding=f.encoding

test = pd.read_csv('test.csv', encoding = encoding, sep='|', dtype=str, header=None)
test.head()

The output is a dataframe of NaN values, except for the first element in the dataframe which looks like something in cyrillic/greek.

The above encoding is 'cp1252'. I tried 'ISO-8859-1' but got the same result. I also tried 'utf-8' but got the following error:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte

I even tried using python engine and that produceed the following error.

ParserError: NULL byte detected. This byte cannot be processed in Python's native csv library at the moment, so please pass in engine='c' instead

By the way, I have no issues loading the file in Excel using text to columns.

nrcjea001
  • 1,027
  • 1
  • 9
  • 21

1 Answers1

0

The csv file has a byte order mark (BOM). The answer to my question may be found here

nrcjea001
  • 1,027
  • 1
  • 9
  • 21