Reading files with non ascii characters

Asked May 23 '19 at 21:32

Active May 23 '19 at 21:41

Viewed 60 times

I am trying to read a text file which contains non ascii characters. However, it seems elements of the file are not in unicode.

# -*- coding: utf-8 -*-
from __future__ import unicode_literals



with open("/home/biswadip/Desktop/test", "r") as f:
    content = f.read().splitlines()
print(content)
a= content[0]
print(type(a))

It produces the following output:

['\xc2\xb0', 'a', 'b']

<type 'str'>

Now as it's not a unicode string, I can't do normal operations such as adding another string to it. It produces the "'ascii' codec can't decode" error. I thought from __future__ import unicode_literals was supposed to take care of this issue, but apparently, it's not working. I know I can reload the system with reload(sys) and set default encoding to utf-8, but I think that is not a viable solution.

edited May 23 '19 at 21:41

NickD

5,937
1
21
38

asked May 23 '19 at 21:32

Biswadip Mandal

looks like BOM header. Tried with `encoding="utf-8-sig"` as argument to `open` ? – Jean-François Fabre May 23 '19 at 21:33
@Jean-FrançoisFabre Not a BOM header, but it **is** UTF-8. – Mark Tolonen May 25 '19 at 03:25
So I suppose that the duplicate is okay right? – Jean-François Fabre May 25 '19 at 06:43

Reading files with non ascii characters

0 Answers0