Possible Duplicate:
java regex quantifiers
I am learning some regex right now, and Im having trouble with this problem:
So I have a string like TAG1 sometext TAG2 some text TAG3 someText
What I need to get are the sub-strings between the tag statements. something like
Tag1 sometext
Tag2 some text
Tag3 someText
so I wrote this regex,
Pattern pattern = Pattern.compile("TAG\\d.*TAG\\d");
Matcher matcher = pattern.matcher(string);
while(matcher.find){
print(matcher.group);
}
But the output is
TAG1 sometext TAG2 some text TAG3 someText
The way I understand it is, dot matches anything and star quantifies that to none or many. Since I believe my regex to mean TAG with some number then some other stuff then TAG and some number.
I am also realizing while I write this, that I do not want all subsets of TAG# text TAG# combinations. for example I do not want TAG# text TAG# text TAG#
can someone augment my understanding of regex please?
Thanks
EDIT ---
I am not writing a full blown html parser in regex. no. This is an html parsing project and I am using Jsoup for the biggest part of it. This regex is just a hack to get some meta data about the html so that I pass the html to jsoup in one form or another.