0

the following is a simplification of a regex i am using. on my development machine both $pattern1 and $pattern2 return a match, however on my production machine only $pattern1 returns a match! clearly the only difference between $pattern1 and $pattern2 is that one of them has brackets around a word. however both are valid patterns which should match the given haystack (as far as i know).

$pattern1 = '/\<a name="ERROR TEXT"\>\<\/a\>\s*?validated\s*?\<\/span\>\s*?\<\/h1\>/';
$pattern2 = '/\<a name="ERROR TEXT"\>\<\/a\>\s*?(validated)\s*?\<\/span\>\s*?\<\/h1\>/';
$haystack = '- IFCS msg value, BOOKMARKED AS ERROR TEXT -->
          <a name="ERROR TEXT"></a>
             validated</span>
       </h1>

                <!-- START: .formActionHolder -->
                <div class="formActionHolder">';
preg_match($pattern1, $haystack, $matches);
print_r($matches);

has anyone found this problem before? note that this is not the whole of the regex - this is a simplified version which i have identified as being the problem. in my actual code, the value of 'validated' is not a constant - hence my reason for using brackets to capture the word. of course the patterns have other characters within the parenthesis as well so that i can capture the variable words here. this is just a simplified example which hones in on the problem that i am having with two seemingly fine regexes.

on my development machine i am using php5.3.2 with the pcre 7.8 library and on my production machine i am using php5.2.4 with pcre 7.4.

mulllhausen
  • 4,225
  • 7
  • 49
  • 71

3 Answers3

0

Parenthesis are used for grouping in a php regex and act as such unless you escape them to make them act as the characters themselves.

borrible
  • 17,120
  • 7
  • 53
  • 75
  • I think that is the point - they should both match as there are no `()` characters in the subject string. Indeed, they do both match on [RegExr](http://gskinner.com/RegExr/)... – DaveRandom Sep 05 '11 at 08:35
  • i know. i want to capture the word 'validated' since it is not a constant – mulllhausen Sep 05 '11 at 08:35
  • @mulllhausen if it is not constant, that regex will not match if it has any value other than `validated`. Would you maybe be better using a [DOM parser](http://php.net/manual/en/book.dom.php) for this job? – DaveRandom Sep 05 '11 at 08:38
  • @mulllhausen - you might care to update your question to indicate what problem you are having. Certainly, on my machine, I get the behaviour I would expect from the regex, i.e. the second pattern matches the string for me and correctly picks up on the grouping. – borrible Sep 05 '11 at 08:39
  • @DaveRandom - i appreciate the advice but i am pretty fluent with regexes and all my code is tailored around them so it would be a massive hassle to change to a dom parser. as i said earlier, this example is merely a simplification of my real regex (which has a more involved series of characters in the parenthesis). however this example i have given identifies the problem i have. – mulllhausen Sep 05 '11 at 08:42
  • @mulllhausen No worries if you have considered the options and concluded that is the best way to go then fair enough – DaveRandom Sep 05 '11 at 08:45
  • @borrible - ok i've put in more details to explain my situation. let me know if its still not clear – mulllhausen Sep 05 '11 at 08:49
0

are you sure the $pattern2 coundn't match? In my eclipse, it match, show Array ( [0] => validated [1] => validated )

steve
  • 608
  • 1
  • 5
  • 16
  • yeah i know! it works fine on my dev machine running php5.3.2 aswell. – mulllhausen Sep 05 '11 at 08:38
  • @mulllhausen I cannot repeat this on PHP/5.2.9-2 Win32 or PHP/5.2.17 Win32 - is there any way you could upgrade PHP on the production machine? It's looking to me like this may be some kind of bizarre bug in 5.2.4 – DaveRandom Sep 05 '11 at 08:44
  • @DaveRandom - a php upgrade would be ideal, actually it was my first thought. but looking into it i realised i would probably need to do a clone of the system to test the upgrade and that is going to be a lot of work :( its really annoying that the dev and prod machines are not using the same versions of things! – mulllhausen Sep 05 '11 at 08:53
0

i had a thought about the ?( combination in $pattern2 so i removed the ? to make

$pattern = '/\<a name="ERROR TEXT"\>\<\/a\>\s*(validated)\s*?\<\/span\>\s*?\<\/h1\>/';

and that works!! its very strange - possibly even a bug?

so it looks like the ?(validated) bit was being interpreted as a conditional subpattern rather than the question mark being used to make the \s* ungreedy

that doesn't look like correct behavior to me.

ah well...its a bit of a pain since now my * will be greedy. the regex pattern does what i want in this instance though...

thanks for all your helpful comments!

mulllhausen
  • 4,225
  • 7
  • 49
  • 71