3

This question may have been answered before in some dark recess of the Interwebs, but I couldn't even figure out how to form a meaningful Google query to search for it.

So: Suppose I have a (simplified) XML document like so:

<root>
  <tag1>Value</tag1>
  <tag2>Word</tag2>
  <tag3>
    <something1>Foo</something1>
    <something2>Bar</something2>
    <something3>Baz</something3>
  </tag3>
</root>

I know how to use JAXB to unmarshal this into a Java Object in the standard use cases.

What I don't know how to do is unmarshal tag3's contents wholesale into a String. By which I mean:

<something1>Foo</something1>
<something2>Bar</something2>
<something3>Baz</something3>

as a String, tags and all.

Random Human
  • 946
  • 1
  • 14
  • 31

2 Answers2

1

Use annotation @XmlAnyElement. I've been looking for the same solution and I expected to find some annotation that prevents parsing dom and live it as it is, but did not find it.

Detail at: Using JAXB to extract inner text of XML element and http://blog.bdoughan.com/2011/04/xmlanyelement-and-non-dom-properties.html I added one cheking in method getElement(), otherwise we could get IndexOutOfBoundsException

if (xml.indexOf(START_TAG) < 0) {
    return "";
}

For me it's quite strange behavior with this solution. method getElement() is called for every tag of your xml. The first call is for "Value", the second - "ValueWord", etc. It appends the next tag for previous

update: I noticed that this approach works only for ONE occurence of tag that we want to parse to String. It's impossible to parse correctly the followint example:

<root>
<parent1>
    <tag1>Value</tag1>
    <tag2>Word</tag2>
    <tag3>
        <something1>Foo</something1>
        <something2>Bar</something2>
        <something3>Baz</something3>
    </tag3>
</parent1>
<parent2>
    <tag1>Value</tag1>
    <tag2>Word</tag2>
    <tag3>
        <something1>TheSecondFoo</something1>
        <something2>TheSecondBar</something2>
        <something3>TheSecondBaz</something3>
    </tag3>
</parent2>

"tag3" with parent tag "parent2" will contain parameters from the first tag (Foo, Bar, Baz) instead of (TheSecondFoo, TheSecondBar, TheSecondBaz) Any suggestions are appreciated. Thanks.

Community
  • 1
  • 1
fanat1k
  • 26
  • 3
  • I independently stumbled into that blog post after asking the question, and ended up using that solution. Considering the utter fails that are the other responses, you get the green check mark. – Random Human Apr 18 '13 at 14:58
-1

I have an utility method that might come in handy for you in that case. See if it helps. I made a sample code with your example:

public static void main(String[] args){
    String text= "<root><tag1>Value</tag1><tag2>Word</tag2><tag3><something1>Foo</something1><something2>Bar</something2><something3>Baz</something3></tag3></root>";
    System.out.println(extractTag(text, "<tag3>"));

}

public static String extractTag(String xml, String tag) {
    String value = "";
    String endTag = "</" + tag.substring(1);

    Pattern p = Pattern.compile(tag + "(.*?)" + endTag);
    Matcher m = p.matcher(xml);

    if (m.find()) {
        value = m.group(1);
    }

    return value;
}
Rodrigo Sasaki
  • 7,048
  • 4
  • 34
  • 49
  • I imagine it has something to do with this: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 – Random Human May 28 '13 at 13:54