2

There is a JSON file with Kannada letters in it. Info.json

{
  "name":"",
  "url":"",
  "desc":"ಹಾಡುಗಳನ್ನು ಈಗ ಆನಂದಿಸಿ."
}

If i try to read this file without encoding like

with open('info.json', 'r')

I get Error: 'charmap' codec can't decode byte 0x8d in position 38: character maps to <undefined>

If I use UTF-8 like with open('info.json', 'r', encoding='utf-8')

only the Kannada Content is converted into Escape Unicode Entities like \u0c85\u0ca4\u0ccd\u0ca4\u0cb2\u0cbf\u0ca4\u0ccd\u0ca4

As this is a string I am finding problem in converting this back to actual Kannada Characters.

I tried using various types of decoding like...

str(infoObj['desc'], "utf-8"),
infoObj['desc'].decode('unicode-escape')

Did a lot of research for 5 hours without any success.

Seeking assistance as to how i can get back Kannada Text.

Thanks in advance.

hondvryer
  • 442
  • 1
  • 3
  • 18
  • can you provide complete code in python which will help SO member to try and solve your problem? – dkb Jan 18 '19 at 12:05
  • simple command like `with open('info.json', 'r', encoding='utf-8') as file: for line in file: print(line)` is working fine, and printing kannada in console. – dkb Jan 18 '19 at 12:06
  • this too works fine `import json from pprint import pprint with open('info.json', 'r', encoding='utf-8') as file: data = json.load(file) pprint(data)` – dkb Jan 18 '19 at 12:10

2 Answers2

2

If I use UTF-8 like with open('info.json', 'r', encoding='utf-8')

only the Kannada Content is converted into Escape Unicode Entities like \u0c85\u0ca4\u0ccd\u0ca4\u0cb2\u0cbf\u0ca4\u0ccd\u0ca4

No it is not.

The Kannada content is correctly interpreted as a Python string containing the Kannada letters. Simply, depending of the way you are trying to display a non ascii string, some characters may be displayed with their unicode values, may disappear or may be replaced with an other special replacement character.

And Python makes no difference between a character and its representation:

>>> "\x41\x62" == "Ab"
True

So you may have a problem in displaying Kannada letters, but not in correctly decoding the json file.

Community
  • 1
  • 1
Serge Ballesta
  • 143,923
  • 11
  • 122
  • 252
0

it worked for me when I added errors='ignore' along with utf8 encoding...

with open('info.json', 'r', encoding='utf8', errors='ignore')
Praveen
  • 346
  • 1
  • 6
  • 18