5

I have an XML like the following:

<documentation>
    This value must be <i>bigger</i> than the other.
</documentation>

Using JDOM, I can get the following text structures:

Document d = new SAXBuilder().build( new StringReader( s ) );
System.out.printf( "getText:          '%s'%n", d.getRootElement().getText() );
System.out.printf( "getTextNormalize: '%s'%n", d.getRootElement().getTextNormalize() );
System.out.printf( "getTextTrim:      '%s'%n", d.getRootElement().getTextTrim() );
System.out.printf( "getValue:         '%s'%n", d.getRootElement().getValue() );

which give me the following outputs:

getText:          '
    This value must be  than the other.
'
getTextNormalize: 'This value must be than the other.'
getTextTrim:      'This value must be  than the other.'
getValue:         '
    This value must be bigger than the other.
'

What I really wanted was to get the content of the element as a string, namely, "This value must be <i>bigger</i> than the other.". getValue() comes close but removes the <i> tag. I guess I wanted something like innerHTML for XML documents...

Should I just use an XMLOutputter on the contents? Or is there a better alternative?

st.never
  • 11,723
  • 4
  • 20
  • 21

3 Answers3

0

In JDOM pseudocode:

for Object o in d.getRootElement().getContents()
   if o instanceOf Element
      print <o.getName>o.getText</o.getName>
   else // it's a text
      print o.getText() 

However, as Prashant Bhate wrote: content.getText() gives immediate text which is only useful fine with the leaf elements with text content.

Community
  • 1
  • 1
Renaud
  • 16,073
  • 6
  • 81
  • 79
-1

Jericho HTML is great for this sort of task. You can accomplish exactly what you're trying to do with a code block like this:

String snippet = new Source(html).getFirstElement().getContent().toString();

It's also great for working with HTML in general because it doesn't try to force it into being XML...it deals with it much more leniently.

stevevls
  • 10,675
  • 1
  • 45
  • 50
  • This is interesting, I'll give it a shot sometime. Right now I'd rather avoid adding another dependency to the project... – st.never Apr 29 '11 at 14:31
  • He has a JDom document, not HTML. XML != HTML. – james.garriss Jun 21 '13 at 18:25
  • @james.garriss Of course HTML and XML are different. My point was that one could use Jericho to simplify a task that can be annoying to deal via cumbersome XML APIs. – stevevls Jun 21 '13 at 18:55
-2

I'd say you should change your document to

<documentation>
  <![CDATA[This value must be <i>bigger</i> than the other.]]>
</documentation>

in order to adhere to the XML specification. Otherwise <i> would be considered a child element of <documentation> and not content.

Thomas
  • 87,414
  • 12
  • 119
  • 157
  • I guess this might indeed be the quickest way. Will try. On a side note, however, that documentation element is an `xsd:documentation`, whose content is declared as `any`, so the example is technically correct... – st.never Apr 29 '11 at 14:34
  • 1
    A child node in the middle of mixed content does NOT imply that the node is not actually a node. – james.garriss Jun 21 '13 at 18:26