2

Short question: Given a String str = ""; output an XML file containing <tag></tag> instead of <tag>&#128557;</tag>

I am trying to create an XML file in JAVA that may contain normal text or emoji within a tag. The XML file is in UTF-8 encoding, so that when opened up in Notepad++, you can see normal text as well as emoji within a tag. While testing my code, somehow the emoji got translated as &#xxxxxx;.

Sample code:

String str = "";
Document document = DocumentBuilderFactory.newInstance().newDocumentBuilder().newDocument();
Element root = document.createElement("tag");
root.appendChild(document.createTextNode(str));
document.appendChild(root);
Transformer transformer = TransformerFactory.newInstance().newTransformer();
transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
transformer.transform(new DOMSource(document), new StreamResult(new File("test.xml")));
user1589188
  • 5,316
  • 17
  • 67
  • 130
  • Xalan encodes emojis property using UTF-16, rather than UTF-8. Try: `transformer.setOutputProperty(ENCODING, UTF_16.toString());` – Dave Jarvis Apr 01 '22 at 18:09

1 Answers1

2

Emojis will be translated to their HTML codes by default, but you can prevent this by embedding an instruction to disable escaping for the output. Here's an example using your code, with just two extra lines needed, to disable escaping, and then enable escaping, by calling the Document method createProcessingInstruction():

package com.unthreading.emojitoxml;

import java.io.File;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.OutputKeys;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerException;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.stream.StreamResult;

public class App {

    public static void main(String[] args) throws ParserConfigurationException, TransformerException {

        String str = "";
        Document document = DocumentBuilderFactory.newInstance().newDocumentBuilder().newDocument();
        Element root = document.createElement("tag");
        document.appendChild(document.createProcessingInstruction(StreamResult.PI_DISABLE_OUTPUT_ESCAPING, "")); // <=== ADD THIS LINE
        root.appendChild(document.createTextNode(str));
        document.appendChild(root);
        document.appendChild(document.createProcessingInstruction(StreamResult.PI_ENABLE_OUTPUT_ESCAPING, "")); // <=== ADD THIS LINE
        Transformer transformer = TransformerFactory.newInstance().newTransformer();
        transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
        transformer.transform(new DOMSource(document), new StreamResult(new File("test.xml")));
    }
}

This is the content of test.xml after running that code:

<?xml version="1.0" encoding="UTF-8" standalone="no"?><tag></tag>

Notes:

skomisa
  • 16,436
  • 7
  • 61
  • 102