1

I am trying to match only PHP code, such as the php code in this block:

<?php foo(); ?>

<abc>

<? foo(); ?>

<?php

foo();
bar();

?>

foo();
bar();

<? //also short open tag

foo();
bar();

?><?php

foo();
bar();

I want it to match only code that is between the php tags, including php open tag with closing tag and including only php open tag without closing tag (as can happen at the very end of php code).

I tried many regex options, finally ended up with this, but it obviously doesn't work as I want as it is in /g mode, and also selects the <abc> while it shouldn't (Demo):

<\?.*[\s\S]*?(?:$|\?\>)

Is there any way to achieve this with regex in /gm mode?

Please note that the reason I am asking is because I am using a file search program and when I am searching the content of the many php files I have, I want it to search only inside php code and not come up with results that are irrelevant. So I will use this regex as an additional condition to the rest of the content search. The search program uses PCRE /gm mode.

P.S. Before posting the question, I have done a lot of research on SO and could not find the solution to this question. Among other questions, I have also checked:

My regex is matching too much. How do I make it stop?

Get content between two strings PHP

Single regex to find string between two strings or started with single string only

Conclusion

I ended up using Julio's solution and improving it to also take into account single and double quotations marks as mentioned in the example in Jan's answer. Thank you all for your answers. This is the final regex that works in /gm mode:

<\?[\s\S]*?(?:\z|\?\>|[\"\'].*?[\"\'][\s\S]*?\?>)

Demo

Nikita 웃
  • 2,042
  • 20
  • 45

3 Answers3

2

Use this: <\?[\s\S]*?(?:\z|\?\>)

Demo

.*[\s\S]* is redundant. You just need [\s\S]* for matching any character (also, since .* was greedy, It was matching your end ?>)

Also use \z instead of $

Julio
  • 5,208
  • 1
  • 13
  • 42
2

You could use

<\?(?:php)?        # <? or <?php
(?:(?!\?>)[\s\S])* # do not overrun ?> but match anything else greedily
(?:\?>)?           # ?> in the end

See a demo on regex101.com (mind the verbose flag!).


Let me emphasize that this is generally a bad approach when it comes to e.g. strings such as
<?php
echo "This is hilarious ?>";
?>

See the demo for the latter on regex101.com as well. Here, use a parser instead or rethink your original problem.

Jan
  • 42,290
  • 8
  • 54
  • 79
  • Thanks @Jan. It's almost there, but doesn't match the bottom open php block. – Nikita 웃 Aug 10 '18 at 13:15
  • 1
    @CM웃: Updated, inserted a tempered greedy token. – Jan Aug 10 '18 at 13:22
  • Yes, I considered that issue as well. I think there can be other negative lookarounds that can be added to avoid the quotation marks. right? Also, just curios, which parsers do you recommend? – Nikita 웃 Aug 10 '18 at 13:29
  • @CM웃: Depending on your actual problem you might as well need to write your own parser. – Jan Aug 10 '18 at 13:31
  • 1
    How is this solution to also cover the quotations marks issue? `<\?[\s\S]*?(?:\z|\?\>|[\"\'].*?[\"\'][\s\S]*?\?>)` – Nikita 웃 Aug 10 '18 at 14:17
0

This should work for you:

(<\?)(.*?)(?:$|\?>)/isg

Online example.

Ilia Ross
  • 13,086
  • 11
  • 53
  • 88