0

I downloaded a 6GB gz file from the openlibrary, I extracted it on my ubuntu machine which turned into a 40GB txt file. When inspecting the head of the file using head, I find this string:

"name": "Mawlu\u0304d Qa\u0304sim Na\u0304yit Bulqa\u0304sim"

What encoding is this? Is it possible to get something that is human readable or does it look like it will require the data source to be exported correctly again?

Abs
  • 56,052
  • 101
  • 275
  • 409
  • Looks like ASCII, though it could be anything. The `\u...` things are part of the JSON specification and should be handled by your JSON parser. – Biffen Dec 12 '14 at 15:56

3 Answers3

1

It's standard escaping of unicode characters in a javascript literal string.

the string is Mawlūd Qāsim Nāyit Bulqāsim

Sam Greenhalgh
  • 5,952
  • 21
  • 37
1

This is plain JSON encoding. Your JSON parser will translate the \uNNNN references to Unicode characters. See also: json_encode function: special characters

Community
  • 1
  • 1
MvdD
  • 22,082
  • 8
  • 65
  • 93
0

looks like unicode

http://www.charbase.com/0304-unicode-combining-macron

U+0304: COMBINING MACRON

memic
  • 73
  • 8