5

I am trying to use python to sort through my downloaded Instagram data, the data is a json file, but emoji and other non-text characters are encoded in a way I do not understand, for example:

The json file will contain: \u00e2\u009c\u008c\u00f0\u009f\u0096\u00a4\u00f0\u009f\u008d\u0095\u00f0\u009f\u008e\u00b6\u00f0\u009f\u00a4\u00af. Which on the instagram app is displayed:

Or json: \u00e2\u0080\u0099. Instagram: '(apostrophe)

I have tried to use u"string" and have found similar questions here, here and here but none are in python or provide any useful details to me.

Ani
  • 532
  • 3
  • 13
Elephant
  • 446
  • 1
  • 7
  • 19

2 Answers2

9

Try

.encode('latin-1').decode('utf-8')))
Ani
  • 532
  • 3
  • 13
  • Hello, I am trying to write that text into a file, but I get the error: `UnicodeEncodeError: 'charmap' codec can't encode characters in position 12-14: character maps to ` . How can I fix this? – Hayk Petrosyan Mar 26 '21 at 00:11
  • @HaykPetrosyan Does this answer your question? https://stackoverflow.com/questions/27092833/unicodeencodeerror-charmap-codec-cant-encode-characters – Ani Mar 29 '21 at 13:52
  • Kinda, I just found later that the encoding of Instagram and Twitter texts are in latin-1 or something. So we must first decode from that then encode to utf-8 and do whatever. – Hayk Petrosyan Apr 28 '21 at 21:45
  • @Ani +1 for this. Can you share the source or how you found it? – gavin May 29 '21 at 13:41
-3

if you are on windows press win + . you will get a prompt with emojis then do print("") output:

Liam Hall
  • 43
  • 2
  • 9