27

I have a formatted XML file, and I want to convert it to one line string, how can I do that.

Sample xml:

<?xml version="1.0" encoding="UTF-8"?>
<books>
   <book>
       <title>Basic XML</title>
       <price>100</price>
       <qty>5</qty>
   </book>
   <book>
     <title>Basic Java</title>
     <price>200</price>
     <qty>15</qty>
   </book>
</books>

Expected output

<?xml version="1.0" encoding="UTF-8"?><books><book> <title>Basic XML</title><price>100</price><qty>5</qty></book><book><title>Basic Java</title><price>200</price><qty>15</qty></book></books>
starball
  • 20,030
  • 7
  • 43
  • 238
Ianthe
  • 5,559
  • 21
  • 57
  • 74

11 Answers11

48
//filename is filepath string
BufferedReader br = new BufferedReader(new FileReader(new File(filename)));
String line;
StringBuilder sb = new StringBuilder();

while((line=br.readLine())!= null){
    sb.append(line.trim());
}

using StringBuilder is more efficient then concat http://kaioa.com/node/59

ant
  • 22,634
  • 36
  • 132
  • 182
7

Run it through an XSLT identity transform with <xsl:output indent="no"> and <xsl:strip-space elements="*"/>

<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output indent="no" />
    <xsl:strip-space elements="*"/>
    <xsl:template match="@*|node()">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
    </xsl:template>
</xsl:stylesheet>

It will remove any of the non-significant whitespace and produce the expected output that you posted.

Mohammad Faisal
  • 5,783
  • 15
  • 70
  • 117
Mads Hansen
  • 63,927
  • 12
  • 112
  • 147
6
// 1. Read xml from file to StringBuilder (StringBuffer)
// 2. call s = stringBuffer.toString()
// 3. remove all "\n" and "\t": 
s.replaceAll("\n",""); 
s.replaceAll("\t","");

edited:

I made a small mistake, it is better to use StringBuilder in your case (I suppose you don't need thread-safe StringBuffer)

Illidanek
  • 996
  • 1
  • 18
  • 32
lukastymo
  • 26,145
  • 14
  • 53
  • 66
5

In java 1.8 and above

BufferedReader br = new BufferedReader(new FileReader(filePath));
String content = br.lines().collect(Collectors.joining("\n"));
Al Foиce ѫ
  • 4,195
  • 12
  • 39
  • 49
vijay yadav
  • 121
  • 2
  • 3
  • 1
    If the OP wants to minify the XML, something like this might work for most documents: `reader.lines().map(String::trim).collect(Collectors.joining());`. Note: it would likely fail in cases where element attributes are split over multiple lines. – Gediminas Rimsa Oct 15 '20 at 13:17
4

Open and read the file.

Reader r = new BufferedReader(filename);
String ret = "";
while((String s = r.nextLine()!=null)) 
{
  ret+=s;
}
return ret;
james.garriss
  • 12,959
  • 7
  • 83
  • 96
  • ret +=s :(( don't do that, better use StringBuffer – lukastymo Apr 01 '11 at 08:55
  • @smas :P it's not real code, I still haven't figured out to properly format on this site so I went for the most concise way. The idea still holds (if you import the relevant libraries, set up the variables like `filename`, and set up try `try{} catch{}` blocks) –  Apr 01 '11 at 08:58
  • don't use string concat or stringbuffer as smas suggests, use StringBuilder http://kaioa.com/node/59 – ant Apr 01 '11 at 09:01
4

Using this answer which provides the code to use Dom4j to do pretty-printing, change the line that sets the output format from: createPrettyPrint() to: createCompactFormat()

public String unPrettyPrint(final String xml){  

    if (StringUtils.isBlank(xml)) {
        throw new RuntimeException("xml was null or blank in unPrettyPrint()");
    }

    final StringWriter sw;

    try {
        final OutputFormat format = OutputFormat.createCompactFormat();
        final org.dom4j.Document document = DocumentHelper.parseText(xml);
        sw = new StringWriter();
        final XMLWriter writer = new XMLWriter(sw, format);
        writer.write(document);
    }
    catch (Exception e) {
        throw new RuntimeException("Error un-pretty printing xml:\n" + xml, e);
    }
    return sw.toString();
}
Community
  • 1
  • 1
Mads Hansen
  • 63,927
  • 12
  • 112
  • 147
2

Underscore-java library has static method U.formatXml(xmlstring). Live example

import com.github.underscore.U;
import com.github.underscore.Xml;

public class MyClass {
    public static void main(String[] args) {
        System.out.println(U.formatXml("<a>\n  <b></b>\n  <b></b>\n</a>",
        Xml.XmlStringBuilder.Step.COMPACT));
    }
}

// output: <a><b></b><b></b></a>
Valentyn Kolesnikov
  • 2,029
  • 1
  • 24
  • 31
1

The above solutions work if you are compressing all white space in the XML document. Other quick options are JDOM (using Format.getCompactFormat()) and dom4j (using OutputFormat.createCompactFormat()) when outputting the XML document.

However, I had a unique requirement to preserve the white space contained within the element's text value and these solutions did not work as I needed. All I needed was to remove the 'pretty-print' formatting added to the XML document.

The solution that I came up with can be explained in the following 3-step/regex process ... for the sake of understanding the algorithm for the solution.

String regex, updatedXml;

// 1. remove all white space preceding a begin element tag:
regex = "[\\n\\s]+(\\<[^/])";
updatedXml = originalXmlStr.replaceAll( regex, "$1" );

// 2. remove all white space following an end element tag:
regex = "(\\</[a-zA-Z0-9-_\\.:]+\\>)[\\s]+";
updatedXml = updatedXml.replaceAll( regex, "$1" );

// 3. remove all white space following an empty element tag
// (<some-element xmlns:attr1="some-value".... />):
regex = "(/\\>)[\\s]+";
updatedXml = updatedXml.replaceAll( regex, "$1" );

NOTE: The pseudo-code is in Java ... the '$1' is the replacement string which is the 1st capture group.

This will simply remove the white space used when adding the 'pretty-print' format to an XML document, yet preserve all other white space when it is part of the element text value.

user1113792
  • 91
  • 1
  • 7
1

I guess you want to read in, ignore the white space, and write it out again. Most XML packages have an option to ignore white space. For example, the DocumentBuilderFactory has setIgnoringElementContentWhitespace for this purpose.

Similarly if you are generating the XML by marshaling an object then JAXB has JAXB_FORMATTED_OUTPUT

Jeff Foster
  • 43,770
  • 11
  • 86
  • 103
1

Below I present the prepared solution. Only the standard library of Java 1.8 was used.

XSLT:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output indent="no"/>
    <xsl:strip-space elements="*"/>
    <xsl:template match="@*|node()">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
    </xsl:template>
</xsl:stylesheet>

Java:

public static String convertXmlToOneLine(String xml) throws TransformerException {
    final String xslt =
        "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"yes\"?>\n" +
        "<xsl:stylesheet version=\"1.0\" xmlns:xsl=\"http://www.w3.org/1999/XSL/Transform\">\n" +
        "    <xsl:output indent=\"no\"/>\n" +
        "    <xsl:strip-space elements=\"*\"/>\n" +
        "    <xsl:template match=\"@*|node()\">\n" +
        "        <xsl:copy>\n" +
        "            <xsl:apply-templates select=\"@*|node()\"/>\n" +
        "        </xsl:copy>\n" +
        "    </xsl:template>\n" +
        "</xsl:stylesheet>";

    /* prepare XSLT transformer from String */
    Source xsltSource = new StreamSource(new StringReader(xslt));
    TransformerFactory factory = TransformerFactory.newInstance();
    Transformer transformer = factory.newTransformer(xsltSource);

    /* where to read the XML? */
    Source source = new StreamSource(new StringReader(xml));

    /* where to write the XML? */
    StringWriter stringWriter = new StringWriter();
    Result result = new StreamResult(stringWriter);

    /* transform XML to one line */
    transformer.transform(source, result);

    return stringWriter.toString();
}

Sample output:

<?xml version="1.0" encoding="UTF-8"?><xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"><xsl:output indent="no"/><xsl:strip-space elements="*"/><xsl:template match="@*|node()"><xsl:copy><xsl:apply-templates select="@*|node()"/></xsl:copy></xsl:template></xsl:stylesheet>

License: The MIT License

notebook
  • 63
  • 5
-2
FileUtils.readFileToString(fileName);

link

Charu Khurana
  • 4,511
  • 8
  • 47
  • 81
  • The link even dictates that the method is depreciated. I wouldn't recommend using this method when a simple buffer read with trim would suffice – Grambot Aug 27 '13 at 19:39