-1

I have an XML style string that I am trying to get a groups out of in a while(matcher.find()){} loop. Here is the regex I am using:

<myset setName="(.+?)">(.*?)</myset>

when converted for use in java:

Pattern setPattern = Pattern.compile("<myset setName=\"(.+?)\">(.*?)</myset>");
Matcher matcher = setPattern.matcher(targetString);
while(matcher.find()){
    Log.i(TAG, "First group: " + $1 + "  Second group: " + $2);
}

$1 is the setName -- This should always be at least 1 character.

$2 is everything (or nothing) in between the opening and closing tags. This can be 0 or more characters.

If I do a find() on the string: <myset setName="test"><lots of stuff in this subtag /></myset>

It works perfectly, with $1 being assigned test and $2 assigned <lots of stuff in this subtag />

However, if I do a find() on this string: <myset setName="test"><lots of stuff in this subtag /></myset><myset setName="test2"><more stuff in this subtag /></myset>

Then $1 matches test and $2 matches <lots of stuff in this subtag /></myset><myset setName="test2"><more stuff in here />

The intended behavior is the first find() should have $1 match test and $2 match <lots of stuff in this subtag />. Then the 2nd find() should have $1 match test2 and $2 match <more stuff in this subtag />.

I am sure I am overlooking something obvious. Thanks!

Takamori
  • 1
  • 3
  • 7
    Whenever I see XML and regex in the same post, I can only think about this - http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags?lq=1 – MByD Jun 27 '13 at 18:14
  • 1
    Love that post! Fortunately this is a for a very specific and small set of tags which I get to completely control the formatting. I am writing the "XML" string elsewhere and reading it back here. Maybe I shouldn't even call it XML. – Takamori Jun 27 '13 at 18:18
  • 1
    [I cannot reproduce this](http://fiddle.re/2tk6a) – Martin Ender Jun 27 '13 at 18:23
  • @Takamori: post an [SSCCE](http://sscce.org) demonstrating your behaviour. I cannot reproduce it. The example you provided works as expected. – jlordo Jun 27 '13 at 18:25
  • @Takamori Oh, then the solution is easy: Write and read XML instead. – Hauke Ingmar Schmidt Jun 27 '13 at 18:26

2 Answers2

0

For this:

Then $1 matches test and $2 matches <lots of stuff in this subtag /></myset><myset setName="test2"><more stuff in here />

It sounds like you were using the following expression: <myset setName="(.+?)">(.*)<\/myset>

Example: http://www.rubular.com/r/516CWASjAF

Your expression <myset setName="(.+?)">(.*?)<\/myset> works for me on your sample string but I do recommend changing your expression to following so that if the quoted text is empty, then at least the engine won't leave that area of the string:

<myset setName="([^"]*)">(.*?)<\/myset>

Ro Yo Mi
  • 14,790
  • 5
  • 35
  • 43
0

It turns out my problem wasn't in my syntax or code, as many of you pointed out, but instead the Regex Util plugin I was using in Eclipse. Thank you all for the help. I'll learn to trust my code and test it out even if my tools aren't always indicating it should work!

Takamori
  • 1
  • 3