0

I am reading json files to String, I sometimes update them (replace some specific words by others) and then write those update json files in a zip. This is what an example input file looks like: enter image description here

The issue is, the output "loses" character escaping and is therefore no longer a valid json: enter image description here

To read the json:

    private InputStream processJsonFile(File file) throws IOException {
        String content;
        if (!file.exists())
            return new ByteArrayInputStream("".getBytes(StandardCharsets.UTF_8));
        try {
            content = IOUtils.toString(new FileReader(file, StandardCharsets.UTF_8));
        } catch (IOException | NullPointerException e) {
            Logs.logError("Error while reading file " + file.getPath());
            Logs.logError("It seems to be malformed!");
            return new ByteArrayInputStream("".getBytes(StandardCharsets.UTF_8));
        } finally {
            IOUtils.closeQuietly();
        }

        // here i do things with content

        return new ByteArrayInputStream(content.getBytes(StandardCharsets.UTF_8));
    }

to add an inputstream to the zip file:

  try (fis) {
            while ((length = fis.read(bytes)) >= 0)
                zos.write(bytes, 0, length);
        } 
Th0rgal
  • 703
  • 8
  • 27
  • Mayber this can help https://stackoverflow.com/questions/11145681/how-to-convert-a-string-with-unicode-encoding-to-a-string-of-letters – Krzysztof Cichocki Oct 02 '22 at 20:57
  • Hmm I tried ``content = StringEscapeUtils.escapeJson(content);`` but this escaped everything so this is no longer a valid json – Th0rgal Oct 02 '22 at 21:05
  • `// here i do things with content` …Whatever things you’re doing have resulted in something that is no longer valid JSON. From [the JSON specification](https://datatracker.ietf.org/doc/html/rfc7159.html#section-7): “A string begins and ends with quotation marks. All Unicode characters may be placed within the quotation marks, except for the characters that must be escaped: quotation mark, reverse solidus, **and the control characters (U+0000 through U+001F).**” If you put a character in the 00-1f range in your string, it *must* be expressed as a Unicode escape sequence. – VGR Oct 02 '22 at 21:29

1 Answers1

0

You're outputting raw Unicode values, when you should be encoding them as Unicode constants eg \unnnn.

Use a library to encode the content, eg

content = EntityUtils.toString(content,"UTF-8");
Bohemian
  • 412,405
  • 93
  • 575
  • 722