1

I have a Java application that fetches an xml file from a website which has a multi-line text-node like this:

<root>
   <node>info</node>
   <mlnode>some
multi
line
text</mlnode>
</root>

My code currently looks like this:

    Document doc = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(new InputSource(new StringReader(xml)));
    NodeList nList = doc.getElementsByTagName("mlnode");
    Node nNode = nList.item(0);

    System.out.println(nNode.getTextContent());

Sadly, this code puts my multiline content into one line: somemultilinetext. I'd like to preserve the linebreaks. And I'd like to have the linebreaks preserved on all operating systems (primarily windows and linux).

How do I do that?

Edit: This question is NOT about correct indenting. It's only about keeping the linebreaks in the contents of the nodes. It's important to keep the linebreaks where they were because the content of that node is part of a configuration file and has to be separated by linebreaks.
I don't care about correct indenting (and if I did: I know there are enough sources on SO and other forums that explain how to correctly indent).

wullxz
  • 17,830
  • 8
  • 32
  • 51
  • My question is not about indenting. It's only about the linebreaks in the text of a node. I don't care about correct indenting. – wullxz Mar 12 '13 at 01:54
  • Your code works for me, using Java 1.7.0_17. It prints the text content with line breaks preserved. – VGR Mar 12 '13 at 02:16
  • Strange. I did the test on Linux with OpenJDK 1.7.0_09. Maybe it's platform-dependant? What was your OS? – wullxz Mar 12 '13 at 02:51
  • Works just fine on Mac OSX 10.7 with OpenJDK 1.7.0_04-ea. Are you using a third party DOM parser? – Perception Mar 12 '13 at 05:19
  • I'm using the classes in org.w3c.dom. – wullxz Mar 12 '13 at 12:05

3 Answers3

0

The line breaks are being parsed and output as is.

Your source XML appears to not have the two end-of-line characters that Windows requires. I guess you will have to find line-feeds ("\n") and replace them with carriage-return and line-feed ("\r\n").

Since the additional "\r" will show up as ^M on Unix systems, you should detect the OS in the JVM, ensure it is Windows and do the replacement.

Akber Choudhry
  • 1,755
  • 16
  • 24
0

I've found the error and it wasn't in the code I posted in my question.
The problem was in the code that got the content of a website, which is a xml file, and turns it into a string. I used the following code, adapted from this question:

URL url;
InputStream is = null;
DataInputStream dis;
String line;
String content

try {
    url = new URL("https://stackoverflow.com/");
    is = url.openStream();  // throws an IOException
    dis = new DataInputStream(new BufferedInputStream(is));

    while ((line = dis.readLine()) != null) {
        content += line; // I forgot to add the linebreak here!
        // I added "content += System.getProperty("line.separator");" and it solved my problem
    }
} catch (MalformedURLException mue) {
     mue.printStackTrace();
} catch (IOException ioe) {
     ioe.printStackTrace();
} finally {
    try {
        is.close();
    } catch (IOException ioe) {
        // nothing to see here
    }
}

Sorry, I didn't thought of the problem beeing in that other class so I didn't add that code to my question.

Community
  • 1
  • 1
wullxz
  • 17,830
  • 8
  • 32
  • 51
0

You can change the code as in the following

Document doc = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(new InputSource(new StringReader(xml)));
NodeList nList = doc.getElementsByTagName("mlnode");
Node nNode = nList.item(0);
String value = nNode.getTextContent();

StringBuilder stb = new StringBuilder();
String [] strAry = value.split("\n");
for(int k = 0; k < strAry.length; k++){
   stb.append(strAry[k]);
   stb.append(System.getProperty("line.separator"));
}
System.out.println(stb.toString());

This should print the text as you need

Sachini Samarasinghe
  • 1,081
  • 16
  • 18