-1

My xml string is:

    String neMsg= "<root>" 
              +"   <CONTENT>"
              +"                <![CDATA[00000:<ResponseClass Name=\"Response\"><ITEM>HAHA</ITEM></ResponseClass>]]>"
              +"        </CONTENT>"
              +"</root>";

I have tried to write the code using four ways, but still can not get the content. How do I solve this problem?

 //java.util.regex.Pattern pP0=java.util.regex.Pattern.compile("<!\\[CDATA\\[00000:(\\s|\\S)*?\\]\\]>");
     // java.util.regex.Pattern pP0=java.util.regex.Pattern.compile("<!\\[CDATA\\[00000:(.*)\\]\\]>");
     // java.util.regex.Pattern pP0=java.util.regex.Pattern.compile("<CONTENT>(.*)<!\\[CDATA\\[(.*)\\]\\]>(.*)</CONTENT>");
     Pattern pP0 = Pattern.compile(".*<!\\[CDATA\\[00000:(.*)\\]\\]>.*");
    java.util.regex.Matcher mP0= pP0.matcher(neMsg);
      System.out.println(mP0.group(1));
Emma
  • 27,428
  • 11
  • 44
  • 69
flower
  • 2,212
  • 3
  • 29
  • 44
  • 2
    Don't use regular expressions for parsing XML (or HTML or similar)! – Seelenvirtuose May 01 '19 at 11:50
  • [Have a look at this post](https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454) – Lino May 01 '19 at 11:54
  • 1
    Please take a look at [Using regular expressions to parse HTML: why not?](https://stackoverflow.com/q/590747), [Can you provide some examples of why it is hard to parse XML and HTML with a regex?](https://stackoverflow.com/q/701166) – Pshemo May 01 '19 at 12:08

1 Answers1

1

You should never parse HTML with regex and instead can use HTML parser like JSoup.

And the problem here is, you need to first call matcherObject.find() (use this for finding the pattern anywhere in the string) or matcherObject.matches() (use this for matching whole string with the pattern) method before you can access the match and also you should first always check if the value retured by find or matches is true by using a if or while loop. Also you need to call group(1) instead of group(0) (this will return whole match) to access contents from group1.

Change your code to this,

String neMsg = "<root>" + "   <CONTENT>"
        + "                <![CDATA[00000:<ResponseClass Name=\"Response\"><ITEM>HAHA</ITEM></ResponseClass>]]>"
        + "        </CONTENT>" + "</root>";

Pattern pP0 = Pattern.compile(".*<!\\[CDATA\\[00000:(.*)\\]\\]>.*");
java.util.regex.Matcher mP0 = pP0.matcher(neMsg);
if (mP0.find()) { // matches method will also work because your pattern is wrapped with `.*` from both sides
    System.out.println(mP0.group(1));
}

Prints whole match,

<ResponseClass Name="Response"><ITEM>HAHA</ITEM></ResponseClass>
Pushpesh Kumar Rajwanshi
  • 18,127
  • 2
  • 19
  • 36