0

I've been working on a weekend project, a simple, lightweight XML parser, just for fun, to learn more about Regexes. I've been able to get data in atributes and elements, but am having a hard time separating tags. This is what I have:

    CharSequence inputStr = "<a>test</a>abc<b1>test2</b1>abc1";
    String patternStr = openTag+"(.*?)"+closeTag;

    Pattern pattern = Pattern.compile(patternStr);
    Matcher matcher = pattern.matcher(inputStr);

    StringBuffer buf = new StringBuffer();
    boolean found = false;
    while ((found = matcher.find())) {
      String replaceStr = matcher.group();
      matcher.appendReplacement(buf, "found tag (" + replaceStr + ")");
    }
    matcher.appendTail(buf);

    String result = buf.toString();
    System.out.println(result);


Output: found tag (<a>test</a>abc<b1>test2</b1>)abc1

I need to to end the 'found tag' at each tag, not the whole group. Any way I can have it do that? Thanks.

user1681891
  • 281
  • 1
  • 4
  • 12
  • 5
    Rhetorical question: Can you explain, in one sentence, why you are trying to use regular expressions on XML? – Tomalak Nov 18 '12 at 21:11
  • @Tomalak I really don't know what else would work. What would you suggest? – user1681891 Nov 18 '12 at 23:05
  • An XML parser is the only thing that should be used on XML. You can use [the built-in DOM API](http://stackoverflow.com/q/33262/18771) or a [third party library like XOM](http://www.xom.nu/), which is supposed to be quite easy to learn and use. Another question [discusses what options you have](http://stackoverflow.com/q/373833/18771). Of course I must add the obligatory link to the single highest-voted answer on StackOverflow which discusses [why you should not use regex on XML/HMTL](http://stackoverflow.com/q/1732348/18771). – Tomalak Nov 19 '12 at 07:24

1 Answers1

0

You can try with something as follows to get it working as you require;

int count = matcher.groupCount();
            for(int i=0;i<count;i++)
            {
                 String replaceStr = matcher.group(i);
                  matcher.appendReplacement(buf, "found tag (" + replaceStr + ")");     
            }
dinukadev
  • 2,279
  • 17
  • 22