3

I use a function __() to translate string, and I added an interface to automatically find all theses translation in all files. This is (supposed to be) done with the following regex:

<?php
$pattern = <<<'LOD'
`
  __\(
    (?<quote>               # GET THE QUOTE
    (?<simplequote>')       # catch the opening simple quote
    |
    (?<doublequote>")       # catch the opening double quote
    )
    (?<param1>              # the string will be saved in param1
      (?(?=\k{simplequote}) # if condition "simplequote" is ok
        (\\'|"|[^'"])+      # allow escaped simple quotes or anything else
        |                   #
        (\\"|'|[^'"])+      # allow escaped double quotes or anything else
      )
    )
    \k{quote}             # find the closing quote
    (?:,.*){0,1}          # catch any type of 2nd parameter
  \)
  # modifiers:
  #  x to allow comments :)
  #  m for multiline,
  #  s for dotall
  #  U for ungreedy
`smUx
LOD;
 $files = array('/path/to/file1',);
 foreach($files as $filepath)
 {
   $content = file_get_contents($filepath);
   if (preg_match_all($pattern, $content, $matches))
   {
     foreach($matches['param1'] as $found)
     {
       // do things
     }
   }
 }

that regex does not works for some string double quoted containing an escaped simple quote (\'). It seems in fact, whatever the string is simple or double quoted, the condition is considered as false and so the "else" is always used.

<?php
// content of '/path/to/file1'
echo __('simple quoted: I don\'t "see" what is wrong'); // do not work.
echo __("double quoted: I don't \"see\" what is wrong");// works.

for file1, I expect to have both strings found, but only the double quoted works

Edit added more php code to make it easier to test

Asenar
  • 6,732
  • 3
  • 36
  • 49
  • 1
    could you post some valid and invalid examples along with the expected output? – Avinash Raj Jan 14 '15 at 14:58
  • Have a look at http://stackoverflow.com/questions/6243778/split-string-by-delimiter-but-not-if-it-is-escaped the top answer provides an example on how to capture escaped sequences. – eisberg Jan 14 '15 at 14:59
  • I just edited @AvinashRaj . I hope this is enough – Asenar Jan 14 '15 at 15:03

2 Answers2

3

Use the below regex and get the string you want from group index 2.

__\((['"])((?:\\\1|(?!\1).)*)\1\)

DEMO

Explanation:

  • __\( Matches the literal __( characters.

  • (['"]) Captures the following double or single quotes.

  • (?:\\\1|(?!\1).)* Matches the escaped double or single quotes (quotes is based on the character inside the group index 1) or | not of the character present inside the capturing group (?!\1). zero or more times.

  • \1 refers to the char inside the 1st captured group.

Avinash Raj
  • 172,303
  • 28
  • 230
  • 274
  • Thanks you for the explanation. This works on the demo but not in my code... trying to figure out why – Asenar Jan 14 '15 at 15:15
  • Can you explain me this : According to the doc (http://php.net/manual/fr/regexp.reference.assertions.php) I understand `(?!\1).)` as "match anything (the `.`) except if there is a quote before" which is supposed to append at the first char after the quote. What do I miss ? – Asenar Jan 14 '15 at 16:06
  • `(?!\1).` means match any character `.` but not the one inside group index 1. If the group index 1 contains double quotes, `(?!\1).` would match any character except double quotes. Likewise, for the single quotes.. – Avinash Raj Jan 14 '15 at 16:20
  • ok thanks a lot ! I just understood the lookahead things, meaning "make a break. now, let's see what the next character looks like... is it allowed? ok you can continue" – Asenar Jan 14 '15 at 17:25
0

Avinash Raj's solution is more elegant and probably more efficient (so I validate it), but I just found my mistake, so I post the solution here:

<?php
$pattern = <<<'LOD'
`
  __\(
    (?<quote>               # GET THE QUOTE
    (?<simplequote>')       # catch the opening simple quote
    |
    (?<doublequote>")       # catch the opening double quote
    )
    (?<param1>              # the string will be saved in param1
      (?(simplequote)       # if condition "simplequote" 
        (\\'|[^'])+         # allow escaped simple quotes or anything else
        |                   #
        (\\"|[^"])+         # allow escaped double quotes or anything else
      )
    )
    \k{quote}               # find the closing quote
    (?:,.*){0,1}            # catch any type of 2nd parameter
  \)
  # modifiers:
  #  x to allow comments :)
  #  m for multiline,
  #  s for dotall
  #  U for ungreedy
`smUx
LOD;
Asenar
  • 6,732
  • 3
  • 36
  • 49