0

I am trying to build a regular expression which matches different types of echo statements.... the word echo has already been match..

Example patterns to be matched

"hiii";
"how"."are"."you";
$var."abc";
"abc".$var;
'how'."how".$var;

pattern for var

/^[a-zA-Z_][a-zA-Z0-9_]*/

I already have a pattern to match first 2 patterns...

/((^"[^"]*"\.{0,1})*;)/
hakre
  • 193,403
  • 52
  • 435
  • 836
Aamir
  • 326
  • 2
  • 8
  • Why do you think a recursive approach is better? Why do you need to do this? Maybe there's a better approach. – Amal Murali Apr 27 '14 at 08:57
  • @AmalMurali bcause the expn neede to repeat only on encountering a .(dot) – Aamir Apr 27 '14 at 09:17
  • After reading your updated question, I can tell regex is not the best way to accomplish this task. You're better of with an actual parser. Take a look at [NikiC's PHP parser](https://github.com/nikic/PHP-Parser). – Amal Murali Apr 27 '14 at 09:20
  • i completely agree with u, i realized this after starting it in PHP and tried to look at some of the parsers available but cudnt figure how to make it work.. About my... After entering in Textbox, when a user clicks submit button all of this needs to done in the background automatically.. – Aamir Apr 27 '14 at 09:25

3 Answers3

1

Regular expressions aren't a solution for everything. For example, in this case it's easily noticeable you want to parse PHP code. Just like you shouldn't parse HTML with regex, you shouldn't parse PHP with regex.

Instead, use PHP's tokenizer, which can be used to parse PHP expressions.

Konrad Borowski
  • 11,584
  • 3
  • 57
  • 71
  • i saw tokenizer, i guess this is going to be useful for me in other parts of my project.. but i didnt und how it can b used w.r.t this ques... – Aamir Apr 27 '14 at 10:54
1

Next to the two given suggestions, if you're looking for PHP PCRE based regexes to validate a subset of PHP, this can be done more structured by specifying named subpatterns for the tokens you're looking for. Here is an exemplary regular expression pattern that's looking for these patterns even allowing whitespace around (as PHP would do) for any us-ascii based extended single-byte charsets (I think this is how PHP actually treats it even if it's UTF-8 in your files):

~
(?(DEFINE)
    (?<stringDoubleQuote> "(?:\\"|[^"])+")
    (?<stringSingleQuote> '(?:\\'|[^'])+')
    (?<string> (?:(?&stringDoubleQuote)|(?&stringSingleQuote)))
    (?<variable> \\\$([a-zA-Z_\x7f-\xff][a-zA-Z0-9_\x7f-\xff]*))
    (?<varorstring> (?:(?&variable)|(?&string)))
)
^ \s* (?&varorstring) (?: \s* \. \s* (?&varorstring) )* \s* ; $
~x

Thanks to the named subpatterns it's easy to use a token for any string or variable and add the whitespace handling and string concatenating operator. Such assigned to $pattern, an example of use is:

$lines = <<<'LINES'
"hiii";
"how"."are"."you";
$var."abc";
"abc".$var;
'how'."how".$var;
LINES;    

foreach (explode("\n", $lines) as $subject) {
    $result = preg_match($pattern, $subject);
    if (FALSE === $result) {
        throw new LogicException('PCRE pattern did not compile.');
    }
    printf("%s %s match.\n", var_export($subject, true), $result ? 'did' : 'did not');
}

Output:

'"hiii";' did match.
'"how"."are"."you";' did match.
'$var."abc";' did match.
'"abc".$var;' did match.
'\'how\'."how".$var;' did match.

Demo: https://eval.in/142721

Related

Community
  • 1
  • 1
hakre
  • 193,403
  • 52
  • 435
  • 836
  • Genius! im not so good in php so not able to understand few things [but understood ur logic] can u tell more about the pattern or point some tutorial? i tried to google but didnt find any proper answers...like for instance why have u used <<< , ~ , DEFINE in pattern? means ur giving name to the sub-pattern ?? also how to alter it to match only $var; or ""; [if possible]... Thanks a lot – Aamir Apr 27 '14 at 10:52
  • The [(?(DEFINE) syntax](http://www.rexegg.com/regex-disambiguation.html#define) is a little known feature, interesting and detailed answer. – Jonny 5 Apr 27 '14 at 12:31
  • @Aamir: Everthing on how to write strings in PHP is outlined in the PHP manual: http://php.net/string - And everything about how to write a PCRE regular expression is outlined in the Perl documentation (PCRE aims to be compatible): http://perldoc.perl.org/perlre.html#Extended-Patterns (sorry much to read and regexes are sometimes hard to wrap the head around, this at least are both the references so you can rely to these safely) – hakre Apr 27 '14 at 12:42
  • thanks a lot for the refrences bro.. now im understanding a lot better, but why is only $var; not matching ?? as far as im understanding the same pattern should be able to match but it is not matching ... – Aamir Apr 27 '14 at 17:05
  • @Aamir: For me, it matches: https://eval.in/142892 - All I did was adding it (and making the pattern more readable but that should not have changed it's behavior. Try also by just editing the original demo. – hakre Apr 27 '14 at 17:30
  • @hakre strange i copied the code frm eval.in now and its working.. i modified the expn to include few more pat and still working fine i guess.. Thanks again bro.. – Aamir Apr 27 '14 at 19:28
  • @hakre in ur expn single and double quotes (? "(?:\\"|[^"])+") (? '(?:\\'|[^'])+') why is \\ used ?? , why it cant just be "[^"]+?" – Aamir Apr 29 '14 at 19:34
  • Those are necessary because strings can contain their quotes when escaped. And the escape sequence starts with the slash which is then encoded as double-slash so that it's preserved. See last line in the examples here: https://eval.in/private/3fe83c071b4c7e – hakre May 03 '14 at 09:04
0

You can do that with the following regex without needing to use recursion:

^"[^"]+"(\."[^"]+")*;$

Demo: http://regex101.com/r/oW5zH4

sshashank124
  • 31,495
  • 9
  • 67
  • 76
  • Thanks that works great... can we include other patterns in the same reg ?? [see modified question] – Aamir Apr 27 '14 at 09:15
  • but it wont match ""; thats the reason i included * in the begining, sorry for not giving enough examples... basically i wont match almost all types of echo provided by php [as far as possible] – Aamir Apr 27 '14 at 09:20