3

I'm try to write a regular expression to use in a Java program that will recognize a pattern that may appear in the input an unknown number of times. My silly little example is:

String patString = "(?:.*(h.t).*)*";

Then I try to access the matches from a line like "the hut is hot" by looping through matcher.group(i). It only remembers the last match (in this case, "hot") because there is only one capture group--I guess the contents of matcher.group(1) get overwritten as the capture group is reused. What I want, though, is some kind of array containing both "hut" and "hot."

Is there a better way to do this? FWIW, what I'm really trying to do is to pick up all the (possibly multiword) proper nouns after a signal word, where there may be other words and punctuation in between. So if "saw" is the signal and we have "I saw Bob with John Smith, and his wife Margaret," I want {"Bob","John Smith","Margaret"}.

umbraphile
  • 377
  • 2
  • 5
  • 15
  • What about using only `h.t` as the pattern string? – vbence Mar 26 '11 at 19:43
  • If I use only `(h.t)` (with parens to make it a capture group), then I get only the first occurrence, instead of the last. (Is that what you meant?) – umbraphile Mar 26 '11 at 19:46
  • I asked this myself over here: http://stackoverflow.com/questions/5018487/regular-expression-with-variable-number-of-groups – aioobe Mar 26 '11 at 19:47
  • Huh, how did I not see that question? I read a bunch of similar ones before posting, but you're right, yours is exactly the same. Doesn't sound too promising--how did you work around it? – umbraphile Mar 26 '11 at 19:51
  • @umbraphile You don't really need parentheses. `group()` or `group(0)` will give you back the whole match. – vbence Mar 26 '11 at 22:59

1 Answers1

6

(Similar question: Regular expression with variable number of groups?)

This is not possible. Your best alternative is to use h.t, and use a

while (matcher.find()) {
    ...
    ... matcher.group(1); ...
    ...
}

The feature does exist in .NET, but as mentioned above, there's no counterpart in Java.

Community
  • 1
  • 1
aioobe
  • 413,195
  • 112
  • 811
  • 826
  • Okay--I tried this with my hat/hut/hot example and it's fine--just need to translate it to my more complicated real-world problem! Thanks. – umbraphile Mar 26 '11 at 20:12