0

I need to split the following string only the data between the "CHAR" tabs:

Input:

<MSG><KEY>name.extObject</KEY><PARAM><CHAR>Number</CHAR><CHAR>7015:188188</CHAR></PARAM></MSG>

Expected output: Number 7015:188188

I am looking for something efficient.

Any recommendation ?

Thanks

angus
  • 3,210
  • 10
  • 41
  • 71

5 Answers5

1

It is good practice to avoid parsing XML/HTML with regex. Instead you can use proper XML parser? I like to use jsoup so here is example how it can be done with this libraryL:

String data = "<MSG><KEY>name.extObject</KEY><PARAM><CHAR>Number</CHAR><CHAR>7015:188188</CHAR></PARAM></MSG>";

Document doc = Jsoup.parse(data, "", Parser.xmlParser());
String charText = doc.select("CHAR").text();

System.out.println(charText);

Output: Number 7015:188188

Community
  • 1
  • 1
Pshemo
  • 122,468
  • 25
  • 185
  • 269
0

I think you meant to capture the content between tags than splitting the string.

It's well known that you should NOT use a regex to parse xhtml since you can get w͈̦̝͉̬͔͕͡ͅe̴͏̰̜͖̗̤̙̖̕i̧̩̭̳̱̖̦͠ͅŗ̴̼̺̻͕̀d̶̩̖̦̖̲̣̺̫͘ ̡͇̥̩͓c͕̻̫͉̞͝ͅo̯̗͜͜͝ṇ̠͘t̛̬̮̞̥͕̙̞e̷̸̗̼͟ͅn̡͎̖̜̱͟͢t̨̙̫̻̱̺͈̗͝. Although, if you still want a regex you can use a regex like this:

<CHAR>(.*?)<\/CHAR>

Working demo

And you can have this java code:

String line = "<MSG><KEY>name.extObject</KEY><PARAM><CHAR>Number</CHAR><CHAR>7015:188188</CHAR></PARAM></MSG>";
Pattern pattern = Pattern.compile("<CHAR>(.*?)<\\/CHAR>");
Matcher matcher = pattern.matcher(line);

String result = "";
while (matcher.find()) {
    result += matcher.group(1) + " ";
}
System.out.println(result); //Prints: Number 7015:188188

Update: as Pshemo pointed in his comment:

/ is not special character in Java regex engine. You don't have to escape it

So, you can use:

Pattern pattern = Pattern.compile("<CHAR>(.*?)</CHAR>");

Btw, I really like Pshemo answer, it's a nice approach to solve this without regex and xhtml

Community
  • 1
  • 1
Federico Piazza
  • 30,085
  • 15
  • 87
  • 123
  • `/` is not special character in Java regex engine. You don't have to escape it (it only creates confusion). – Pshemo Apr 10 '15 at 20:51
0

In case you know the tag value is always some digit, then an optional colon with digits, and it is the only <CHAR> tag that has such a numeric value, you may want to use this regex:

 (?<=<CHAR>)\d+(?::\d+)?(?=<\/CHAR>)

Java string:

 String pattern = "(?<=<CHAR>)\\d+(?::\\d+)?(?=</CHAR>)";

Sample code:

String str = "<MSG><KEY>name.extObject</KEY><PARAM><CHAR>Number</CHAR><CHAR>7015:188188</CHAR></PARAM></MSG>";
Pattern ptrn = Pattern.compile("(?<=<CHAR>)\\d+(?::\\d+)?(?=</CHAR>)");
Matcher matcher = ptrn.matcher(str);
if (matcher.find()) {
   System.out.println(matcher.group(0));
}

Output:

7015:188188
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
0
String s = inputString;
String result="";
while(s.indexOf("<CHAR>") != -1)
{
    result += s.substring(s.indexOf("<CHAR>") + "<CHAR>".length(), s.indexOf("</CHAR>")) + " ";
    s = s.substring(s.indexOf("</CHAR>") + "</CHAR>".length());
}

//result is now the desired output
Andy Brunner
  • 11
  • 1
  • 3
0

Regex for that is : (.*?)</CHAR>

However, it is better to use an XML parser for that.

shepard23
  • 148
  • 2
  • 13