How to retrieve an Element mixed children as text (JDOM)

Question

I have an XML like the following:

<documentation>
    This value must be <i>bigger</i> than the other.
</documentation>

Using JDOM, I can get the following text structures:

Document d = new SAXBuilder().build( new StringReader( s ) );
System.out.printf( "getText:          '%s'%n", d.getRootElement().getText() );
System.out.printf( "getTextNormalize: '%s'%n", d.getRootElement().getTextNormalize() );
System.out.printf( "getTextTrim:      '%s'%n", d.getRootElement().getTextTrim() );
System.out.printf( "getValue:         '%s'%n", d.getRootElement().getValue() );

which give me the following outputs:

getText:          '
    This value must be  than the other.
'
getTextNormalize: 'This value must be than the other.'
getTextTrim:      'This value must be  than the other.'
getValue:         '
    This value must be bigger than the other.
'

What I really wanted was to get the content of the element as a string, namely, "This value must be <i>bigger</i> than the other.". getValue() comes close but removes the <i> tag. I guess I wanted something like innerHTML for XML documents...

Should I just use an XMLOutputter on the contents? Or is there a better alternative?

Look at Prashant Bhate's solution on this page, as I think it's the answer you're looking for: http://stackoverflow.com/questions/7910474/how-to-get-node-contents-from-jdom — james.garriss, Jun 21 '13 at 18:34

score 0 · Answer 1 · edited May 23 '17 at 11:48

0

In JDOM pseudocode:

for Object o in d.getRootElement().getContents()
   if o instanceOf Element
      print <o.getName>o.getText</o.getName>
   else // it's a text
      print o.getText()

However, as Prashant Bhate wrote: content.getText() gives immediate text which is only useful fine with the leaf elements with text content.

edited May 23 '17 at 11:48

Community

1
1

answered Oct 07 '13 at 08:12

Renaud

16,073
6
81
79

score -1 · Answer 2 · answered Apr 29 '11 at 14:18

-1

Jericho HTML is great for this sort of task. You can accomplish exactly what you're trying to do with a code block like this:

String snippet = new Source(html).getFirstElement().getContent().toString();

It's also great for working with HTML in general because it doesn't try to force it into being XML...it deals with it much more leniently.

answered Apr 29 '11 at 14:18

stevevls

10,675
1
45
50

This is interesting, I'll give it a shot sometime. Right now I'd rather avoid adding another dependency to the project... – st.never Apr 29 '11 at 14:31
He has a JDom document, not HTML. XML != HTML. – james.garriss Jun 21 '13 at 18:25
@james.garriss Of course HTML and XML are different. My point was that one could use Jericho to simplify a task that can be annoying to deal via cumbersome XML APIs. – stevevls Jun 21 '13 at 18:55

score -2 · Accepted Answer · answered Apr 29 '11 at 14:18

-2

I'd say you should change your document to

<documentation>
  <![CDATA[This value must be <i>bigger</i> than the other.]]>
</documentation>

in order to adhere to the XML specification. Otherwise <i> would be considered a child element of <documentation> and not content.

answered Apr 29 '11 at 14:18

Thomas

87,414
12
119
157

I guess this might indeed be the quickest way. Will try. On a side note, however, that documentation element is an `xsd:documentation`, whose content is declared as `any`, so the example is technically correct... – st.never Apr 29 '11 at 14:34
1

A child node in the middle of mixed content does NOT imply that the node is not actually a node. – james.garriss Jun 21 '13 at 18:26

How to retrieve an Element mixed children as text (JDOM)

3 Answers3