0

In python, I am having other languages text as,

import json

name = "அரவிந்த்"

result = {"Name": name}
j_res = json.dumps(result)
print j_res

Output:

{"Name": "\u0b85\u0bb0\u0bb5\u0bbf\u0ba8\u0bcd\u0ba4\u0bcd"}

Is there any ways to get the name of அரவிந்த் from \u0b85\u0bb0\u0bb5\u0bbf\u0ba8\u0bcd\u0ba4\u0bcd this text.?

ti7
  • 16,375
  • 6
  • 40
  • 68
  • Consider to use Python 3, which simplify/solve character problems [and so you will need to learn encoding/decoding handling just once]. Python 2 is not just obsolete (since many years), but also no more supported (and so you will see less and less resources on how to convert python 2 to python 3 (and you will have difficulties to find again tools/modules for python 2) – Giacomo Catenazzi Jun 17 '20 at 09:39

2 Answers2

1

Yes, it just as simple:

# -*- coding: utf-8 -*-

import json

name = "அரவிந்த்"

result = {"Name": name}
j_res = json.dumps(result)

print j_res
print json.loads(j_res)
print json.loads(j_res)["Name"]

Output:

{"Name": "\u0b85\u0bb0\u0bb5\u0bbf\u0ba8\u0bcd\u0ba4\u0bcd"}
{u'Name': u'\u0b85\u0bb0\u0bb5\u0bbf\u0ba8\u0bcd\u0ba4\u0bcd'}
அரவிந்த்
ruohola
  • 21,987
  • 6
  • 62
  • 97
  • 1
    This is almost-certainly what you want - note the `u` prefixing the string declaration, making it a `unicode` object when created. – ti7 Jun 16 '20 at 17:24
0

In Python 2.7, strings are simply collections of the ASCII charset (0 through 255 bits) .. if you need to handle and show characters beyond these 256 characters, you should almost-certainly use unicode objects (prefixed by u) instead of the naive str (default for new strings).

In Python 3+, this problem is solved by strings being arrays of raw bytes with an associated encoding (normally utf-8), which can represent all types of characters found in the encoding. If you can use Python 3, it may solve this and many similar problems related to how strings and characters are saved and displayed for you.

If you're forced to use Python 2.7, you should read these with an encoding and make certain they're loaded as unicode

ti7
  • 16,375
  • 6
  • 40
  • 68