1

I'm learning to work with JSON by making a simple program in python that analyzes facebook messages in JSON I downloaded, but these messages contain plenty of Unicode characters that are written in the JSON file like this

pom\u00c3\u00b4\u00c5\u00bee

The example above is supposed to be word

pomôže

however, when I try to work with the string and print out the word it comes up like this

'pomôže'

Even most online converters printed it out like this except this one https://github.com/mathiasbynens/utf8.js Is there any way to fix this?

EDIT: Alright, so I'm sorry for not being clear enough. Hopefully, this will make things clearer: I have a JSON file that looks like this, when opened in Notepad++:

{
    "participants": [
        {
            "name": "Person1"
        },
        {
            "name": "Person2"
        }
    ],
    "messages": [
        {
          "sender_name": "Person1",
          "timestamp_ms": 1521492166805,
          "content": "D\u00c3\u00bafam, \u00c5\u00bee pom\u00c3\u00b4\u00c5\u00bee",
          "type": "Generic"
        }
    ]
}

When I try to print or work with the content of the message :

import json
with open("messages.json", "r") as f:
    messages = json.load(f)
    print(messages["messages"][0]["content"])

the string looks like this:

Dúfam, že pomôže

How do I get the text into readable form?

Wranny
  • 11
  • 4

1 Answers1

0

It took me a while to understand but it is quite easy the reason, the character table is read in many ways, in your case the problem is that you want to print in utf8 but the table utf-8 is related to the system language, you have to print in utf-16

I'll give you some examples:

in javascript:

console.log("pom\u{00f4}\u{017E}e");

in python 3

print("pom"+u"\u00F4"+u"\u017E"+"e")

in python 2

print("pom"+u"\u00F4".encode('utf-8')+u"\u017E".encode('utf-8')+"e")

doc python 2.X

doc python 3.X