Writing JSON to file gives?

Question

I have a JSON file with different names of countries and languages etc. I want to strip it down to just the information I need/want for what I am doing. For example I would like to turn

[{
    "name": {
        "common": "Afghanistan",
        "official": "Islamic Republic of Afghanistan",
        "native": {
            "common": "\u0627\u0641\u063a\u0627\u0646\u0633\u062a\u0627\u0646",
            "official": "\u062f \u0627\u0641\u063a\u0627\u0646\u0633\u062a\u0627\u0646 \u0627\u0633\u0644\u0627\u0645\u064a \u062c\u0645\u0647\u0648\u0631\u06cc\u062a"
        }
    },
    "tld": [".af"],
    "cca2": "AF",
    "ccn3": "004",
    "cca3": "AFG",
    "currency": ["AFN"],
    "callingCode": ["93"],
    "capital": "Kabul",
    "altSpellings": ["AF", "Af\u0121\u0101nist\u0101n"],
    "relevance": "0",
    "region": "Asia",
    "subregion": "Southern Asia",
    "nativeLanguage": "pus",
    "languages": {
        "prs": "Dari",
        "pus": "Pashto",
        "tuk": "Turkmen"
    },
    "translations": {
        "cym": "Affganistan",
        "deu": "Afghanistan",
        "fra": "Afghanistan",
        "hrv": "Afganistan",
        "ita": "Afghanistan",
        "jpn": "\u30a2\u30d5\u30ac\u30cb\u30b9\u30bf\u30f3",
        "nld": "Afghanistan",
        "rus": "\u0410\u0444\u0433\u0430\u043d\u0438\u0441\u0442\u0430\u043d",
        "spa": "Afganist\u00e1n"
    },
    "latlng": [33, 65],
    "demonym": "Afghan",
    "borders": ["IRN", "PAK", "TKM", "UZB", "TJK", "CHN"],
    "area": 652230
}, ...

Into

[{
    "name": {
        "common": "Afghanistan",
        "native": {
            "common": "\u0627\u0641\u063a\u0627\u0646\u0633\u062a\u0627\u0646"
        }
    },
    "cca2": "AF"
}, ...

But when I try I get

[{
    "name": {
        "common": "Afghanistan",
        "native": {
            "common": "?????????"   <-- NOT WHAT I WANT
        }
    },
    "cca2": "AF"
},

Here is the important code I used to strip out what I don't want.

byte[] encoded = Files.readAllBytes(Paths.get("countries.json"));
String JSONString =  new String(encoded, Charset.forName("US-ASCII"));
...
Writer writer = new OutputStreamWriter(new FileOutputStream("countriesBetter.json"), "US-ASCII");
writer.write(javaObject.toString());
writer.close();

I cannot figure out why it turns the text into question marks. I have tried several character sets to no avail. When I use UTF-8 i get Ø§Ù�ØºØ§Ù†Ø³ØªØ§Ù†

Please help me. Thank you.

`new String(encoded, Charset.forName("US-ASCII"));` what do you expect this to do ? — njzk2, Oct 06 '16 at 21:40
`When I use UTF-8 i get Ø§Ù�ØºØ§Ù†Ø³ØªØ§Ù†` the problem here is how you read it. the file is probably fine. — njzk2, Oct 06 '16 at 21:40
And what exactly do you expect ``\u0627\u0641\u063a\u0627\u0646\u0633\u062a\u0627\u0646`` to look like in a file? — f1sh, Oct 06 '16 at 21:41
`\u0627\u0641\u063a\u0627\u0646\u0633\u062a\u0627\u0646` the two blocks of code that are stripped down are respectively how I want it when I look at it in notepad++ and how is it. — J Blaz, Oct 06 '16 at 21:42
If you want to process JSON text files, use a **JSON parser/generator**. It will know how to write the JSON back out correctly. — Andreas, Oct 06 '16 at 22:23

score 1 · Accepted Answer · edited May 23 '17 at 12:15

1

\u0627 is unicode not ascii and you cannot represent the arabic characters in ascii - hence the ?. For differences between utf formats see Difference between UTF-8 and UTF-16?

when you write it UTF-8 you need to read in the same encoding so the "notepad" knows how to display the bytes it has. If you read it back into java using that encoding it will be unaltered.

edited May 23 '17 at 12:15

Community

1
1

answered Oct 06 '16 at 22:30

stevegal

66
3

score 0 · Answer 2 · answered Oct 06 '16 at 22:23

You will need to change the console encoding to see this.

Go to Run>Run configurations

A pop up will open. Select common tab. In the Encoding section, select other and in dropdown select UTF-8.

Now run the program. I got the below result:

[ {
  "name" : {
    "common" : "Afghanistan",
    "official" : "Islamic Republic of Afghanistan",
    "natives" : {
      "common" : "افغانستان",
      "official" : "د افغانستان اسلامي جمهوریت"
    }
  },
  "tld" : [ ".af" ],
  "cca2" : "AF",
  "ccn3" : "004",
  "cca3" : "AFG",
  "currency" : [ "AFN" ],
  "callingCode" : [ "93" ],
  "capital" : "Kabul",
  "altSpellings" : [ "AF", "Afġānistān" ],
  "relevance" : "0",
  "region" : "Asia",
  "subregion" : "Southern Asia",
  "nativeLanguage" : "pus",
  "languages" : {
    "prs" : "Dari",
    "pus" : "Pashto",
    "tuk" : "Turkmen"
  },
  "translations" : {
    "cym" : "Affganistan",
    "deu" : "Afghanistan",
    "fra" : "Afghanistan",
    "hrv" : "Afganistan",
    "ita" : "Afghanistan",
    "jpn" : "アフガニスタン",
    "nld" : "Afghanistan",
    "rus" : "Афганистан",
    "spa" : "Afganistán"
  },
  "latlng" : [ 33, 65 ],
  "demonym" : "Afghan",
  "borders" : [ "IRN", "PAK", "TKM", "UZB", "TJK", "CHN" ],
  "area" : 652230
} ]

Writing JSON to file gives?

2 Answers2