UnicodeDecodeError: 'utf-8' codec can't decode byte 0xae in position 11: invalid start byte

Question

I am trying to read a CSV file from Google Drive with Pandas library. However, I get the following error:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xae in position 11: invalid start byte

I downloaded the data successfully and stored it in "/content/data" (=working directory).

To read the data I do the following:

file = os.path.join(os.getcwd(), 'file.txt') 
# /content/data/file.txt

df = pd.read_csv(file1, delimiter='\t')

And that's where I get the error. What is the problem here?

I already tried the proposed solutions here: UnicodeDecodeError: 'utf8' codec can't decode byte 0x9c. However, I still get the same error.

It seems that the file is not encoded with "UTF-8". You have to find out the right encoding, e. g. with an advanced text editor which can show and change the encoding. — Michael Butscher, Mar 26 '23 at 14:02
You should be able to duplicate without pandas: `open(file1).read()`. if not, then `sys.stdin.encoding` would be a good guess on its encoding. — tdelaney, Mar 26 '23 at 14:08
@tdelaney I get the same error message with open(file1).read(). — bananabread, Mar 26 '23 at 15:00
You added your question to review following the last edit but all you added to the question is *"I tried the solutions in that question and it didn't work"* but you didn't even showed ***what*** you tried and you ***never*** posted a [mre] of the file giving you this error. How can anyone help you except pointing to that duplicate quesiton? — Tomerikoo, Mar 28 '23 at 14:28

score -2 · Answer 1 · answered Mar 26 '23 at 14:02

-2

use encoding = 'unicode_escape',

file = os.path.join(os.getcwd(), 'file.txt') 
# /content/data/file.txt

df = pd.read_csv(file1, encoding= 'unicode_escape')

answered Mar 26 '23 at 14:02

Dump Eldor

1

Why unicode_escape specifically? Given that the bad character isn't in the ascii range, we are pretty much guaranteed its not unicode_escape. – tdelaney Mar 26 '23 at 14:07
I tried this already and I get the following error: UnicodeDecodeError: 'unicodeescape' codec can't decode bytes in position 6432-6433: malformed \N character escape – bananabread Mar 26 '23 at 15:01

1 Answers1