0

I am trying to read a txt file from a url:

df = pd.read_csv(url, sep = "@#$", header = None, engine = 'python')

But I get the following error:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x9b in position 3706: invalid start byte

I tried the following based on solutions I found online:

error_bad_lines = False
encoding = 'utf_8'
encoding = 'utf_16'

I tried the answer in this link . But I get the following error when I try that:

TypeError: Expected object of type bytes or bytearray, got: <class 'http.client.HTTPResponse'>

But nothing is working. Any other ideas?

  • Where did URL come from? Are you running Windows? 0x9B is the right pointing angle bracket (">") in Windows CP1252, so it's not UTF-8 at all. – Tim Roberts Mar 16 '21 at 21:13
  • @TimRoberts I am using a Windows laptop but I am running the code on EC2 (Ubuntu) But there is a ">" in the file. I checked it now – Kaushik Karalgikar Mar 16 '21 at 21:16
  • Try out `encoding='cp1252'` as @TimRoberts suggested, and if you're still getting a error, please include all your code and the full traceback. We can't diagnose just from the exception line. See this post on [understanding the python traceback](https://realpython.com/python-traceback/). – Michael Delgado Mar 16 '21 at 21:20
  • If you want to cheat, you could just use an editor to replace that with a greater-than sign, but what you really need to do is read that in with `open('name',encoding='cp1252')`. – Tim Roberts Mar 16 '21 at 21:20

0 Answers0