0

I was writing a regex match pattern, when I suddenly found out that preg_match_all isn't matching contents on a different line. What's up with that, and how do I fix it?

test1.php

$content = file_get_contents("test2.php");
preg_match_all("~startMatch(.+?)endMatch~", $content, $matches);

print_r($matches);

test2.php

startMatch text 
endMatch

Results:

Array ( [0] => Array ( ) [1] => Array ( ) )

Expected Results:

Array ( [0] => Array ( [0] => startMatch text endMatch ) [1] => Array ( [0] => text ) )

However, we get the expected result, if startMatch text endMatch is on the same line. What's up with that? Don't periods in regex match all characters? How do I fix it?

frosty
  • 2,559
  • 8
  • 37
  • 73
  • Although I do not suggest that approach, I decided that the question is still a duplicate. You should learn to use unrolling-the-loop technique to forget about `/s` and enjoy high regex performance. Something like `startMatch([^e]*(e(?!ndMatch)[^e]*)*)endMatch` – Wiktor Stribiżew Nov 30 '15 at 21:42
  • @Mariano If I didn't stumble across this little fun fact today, I would forever think that somehow I'm always writing my patterns incorrectly. – frosty Nov 30 '15 at 21:42
  • @stribizhev The what technique? How do I use that? – frosty Nov 30 '15 at 21:44
  • @stribizhev That pattern makes my head hurt trying to read it. Would using the 's' flag really slow down the performance to a noticeable degree? If not, I'd rather prefer to use the simpler flag. – frosty Nov 30 '15 at 21:49
  • You should choose between readability and performance. Also, if you work with short strings, `.*?` or `.+?` is preferable. – Wiktor Stribiżew Nov 30 '15 at 21:51
  • @frosty This article explains the concept http://www.rexegg.com/regex-quantifiers.html#greedytrap – Mariano Nov 30 '15 at 21:58

0 Answers0