0

My input string is like this :

 String msgs="<InfoStart>\r\n" 
            + "id:1234\r\n" 
            + "phone:912119882\r\n" 
            + "info_type:1\r\n"
            +<InfoEnd>\r\n"
            +"<InfoStart>\r\n" 
            + "id:5678\r\n" 
            + "phone:912119881\r\n" 
            + "info_type:1\r\n"
            +<InfoEnd>\r\n";

Now I can use the regular expression to get the info array : private static Pattern patter= Pattern.compile("InfoStart>([\\s\\S]*?)<InfoEnd>");,But how to get the id,phone using regular expression?I try to write the code,but it fail,how to fix it?

 private static Pattern infP = Pattern.compile("<InfoStart>([\\s\\S]*?)<InfoEnd>");   
    private static Pattern lineP = Pattern.compile(".*?\r\n");      
    final java.util.regex.Matcher matcher = patter.matcher(msgs);  
    while (matcher.find()){
     String item = matcher.group(1);  
     Matcher matcherLine = lineP.matcher(item); 
      while(matcherLine.find()){
          if(matcherLine.groupCount()>0){
          String value= matcherLine.group(1);
          int firstIndex=value.indexOf(":");
          System.out.println("key:"+value.substring(0, firstIndex)+"value:"+value.substring(firstIndex+1));
          }
      }
 }
flower
  • 2,212
  • 3
  • 29
  • 44
  • 4
    [Read here](https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags) to learn why it is generally a bad idea to parse HTML/XML content using regex. – Tim Biegeleisen Aug 24 '18 at 03:47
  • what is the output you are looking? – The Scientific Method Aug 24 '18 at 04:08
  • @TheScientificMethod,The value of id,phone,info_type and so on. – flower Aug 24 '18 at 04:16
  • Is there some compelling reason you have to use regex? You could nail this down by splitting into tokens on `\r\n` an using a `startsWith()`. – MarsAtomic Aug 24 '18 at 04:34
  • @MarsAtomic,I have think the situation :We have reach an agreement that each info detail is end with "\r\n".At first,I have replace the "\r\n" with "
    ",and it work well.But I think it may not good.Because if the field like info_type value have the '
    ',it will be wrong.
    – flower Aug 24 '18 at 05:31
  • 1
    Why do you have to replace anything at all? Why don't you read about [String.split()](https://docs.oracle.com/javase/8/docs/api/java/lang/String.html#split-java.lang.String-). Because you're matching on `\r\n`, splitting will not touch any HTML break tags that may be in your input. – MarsAtomic Aug 24 '18 at 16:24

1 Answers1

1

Perhaps you can try this:

    Pattern xmlPattern = Pattern.compile("<InfoStart>\\s+id:(\\d+)\\s+phone:(\\d+)\\s+info_type:(\\d+)\\s+<InfoEnd>");
    Matcher matcher = xmlPattern.matcher(msgs);
    while (matcher.find()) {
        System.out.println(matcher.group(1));
        System.out.println(matcher.group(2));
        System.out.println(matcher.group(3));
    }

The output:

1234
912119882
1
5678
912119881
1

But still I have to as say as Tim Biegeleisen mentioned, you'd better use other way around to parse a XML string.

Besides, your input string is incorrect, it should be:

    String msgs="<InfoStart>\r\n"
            + "id:1234\r\n"
            + "phone:912119882\r\n"
            + "info_type:1\r\n"
            + "<InfoEnd>\r\n" // you lack an open double quote;
            +"<InfoStart>\r\n"
            + "id:5678\r\n"
            + "phone:912119881\r\n"
            + "info_type:1\r\n"
            + "<InfoEnd>\r\n"; // you lack an open double quote;
Hearen
  • 7,420
  • 4
  • 53
  • 63
  • We have reach an agreement that each info detail is end with "\r\n".At first,I have replace the "\r\n" with "
    ",and it work well.But I think it may not good.Because if the field like info_type value have the '
    ',it will be wrong.
    – flower Aug 24 '18 at 05:29
  • @flower why you need to replace them? If you are parsing a string follow some pattern, you use regex; not a big problem, if it's not, you should just turn to a XmlParser for that job. – Hearen Aug 24 '18 at 05:48