2

I am trying to use Java regular expressions to do something I could have sworn I have done many times over, but it seems I have a problem.

Basically, I am using ".*" to skip over everything I don't need until I find something I need.

It is easier for me to explain in code than in writing:

String str = "class=\"c\" div=34234542234234</span>correct<?> blah=12354234234 </span>wrong<";      
Pattern regex = Pattern.compile("class=\"c\".*</span>([^<]*)");     
Matcher matcher = regex.matcher(str);       
boolean found = false;
while (matcher.find()) {
    found = true;
    System.out.println ("Found match: " + matcher.group(1));            
}       
if (!found)
    System.out.println("No matches found");

Now I want my regex to find the "correct", but instead it skips over to the last match and finds "wrong".

Can anyone help me out?

user unknown
  • 35,537
  • 11
  • 75
  • 121
  • 2
    `.*` is greedy and it will try to consume as many characters as possible. I am not sure, but try to use `.*?`, which is reluctant. – nhahtdh May 29 '12 at 01:26
  • 2
    .*? will do what you want, but [parsing HTML with regular expressions is considered harmful.][1] [1]: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 – MK. May 29 '12 at 01:33

1 Answers1

2

You are missing the reluctant qualifier after * - it should be .*? instead.

Sergey Kalinichenko
  • 714,442
  • 84
  • 1,110
  • 1,523