1

i have this regex string that i got from a website to pull emails from a file:

(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])

Ive tested it in regex buddy ( regex testing software ) and it works!

when i copy and paste the regex from regex buddy to my php file, i have to escape 2 " characters to make the regex form a valid string in php.

in php i use it like this:

$file = file_get_contents(/* URL TO GET */);

$email_pattern = "(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|\"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*\")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])";

$matches = array();

if ( preg_match_all ( $email_pattern, $file, $matches ))
{
    echo print_r($matches, true);
}

but i get this warning!?!?

Warning: preg_match_all() [function.preg-match-all]: Unknown modifier '@'

however this regex works in regex buddy?

Where am i going wrong???

AlexMorley-Finch
  • 6,785
  • 15
  • 68
  • 103

2 Answers2

4

2 things:

step 1:

You need to put delimiters ( the / before and after the regex, so that you may add modifier ):

$email_pattern = "/(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|\"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*\")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])/";

step2:

And as your in a PHP string, you'll need to escape all the special character ( like \ that must become \\ , and $ that would become \$ , etc... )

So the escape to include the regex in a PHP String should look like this:

(?:[a-z0-9!#$%&\'*+/=?^_`{|}~-]+(?:\\.[a-z0-9!#$%&\'*+/=?^_`{|}~-]+)*|\\\"(?:[\\x01-\\x08\\x0b\\x0c\\x0e-\\x1f\\x21\\x23-\\x5b\\x5d-\\x7f]|\\\\[\\x01-\\x09\\x0b\\x0c\\x0e-\\x7f])*\\\")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\\x01-\\x08\\x0b\\x0c\\x0e-\\x1f\\x21-\\x5a\\x53-\\x7f]|\\\\[\\x01-\\x09\\x0b\\x0c\\x0e-\\x7f])+)\\])

And you also have to escape /, as we use that caracter for the delimiter of the first step. So we need the regex to see \/, but as we express the regex in a php string, we will replace / by \\/

If I'm right -- usually I use regex buddy too to do the conversion with the PHP export tool, but now I don't have it so I've done it by hand-- it should give something LIKE this:

$email_pattern = '/(?:[a-z0-9!#$%&\'*+\/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&\'*+\/=?^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])/';

I would also suggest that you put the string inside single quote.

FMaz008
  • 11,161
  • 19
  • 68
  • 100
  • I copy and pasted your answer and got this error: Warning: preg_match_all() [function.preg-match-all]: Unknown modifier '='. but the equals sign is already escaped? I also used preg_quote to auto escape regex special chars but them i got the error: Warning: preg_match_all() [function.preg-match-all]: Unknown modifier '\' – AlexMorley-Finch Feb 10 '12 at 14:56
  • copy and pasted my anwser for the step 1, but did you escape as I mentionned in the step 2 ? – FMaz008 Feb 10 '12 at 15:06
  • ill try that now, to \ becomes \\ and $ becomes \$ but are there any more i should be aware of? is there a list online or something? – AlexMorley-Finch Feb 10 '12 at 15:09
  • @AlexMorley-Finch: escape also the slashes `\/` – Toto Feb 10 '12 at 15:10
  • @AlexMorley-Finch I see you might have trouble with understanding the basic of strings, I've just updated my answer, I hope it will help you a bit more this time :) – FMaz008 Feb 10 '12 at 15:20
  • 1
    ```'/(?:[a-z0-9!#$%&\'*+\/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&\'*+\/=?^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])/'``` is what RegexBuddy comes up with for me. You're goin' a bit overboard with the backslashes! – Alan Moore Feb 10 '12 at 15:37
  • Well thanks, I didn't have the tool right now ( under osX ), I'll update the answer with that. (At first, I did an addslashes() to get the regex, so extra character got backslashed... ) – FMaz008 Feb 10 '12 at 15:39
0

I tried and...

Single quotes will give an error...

Use double quotes and the {} as delimiters // gives an error also

Jorge Pinho
  • 278
  • 2
  • 9
  • Single quote would works, and / delimiter would not give error if you escape thoses in the regex. Right now even the $ is interpreted because of the double quote. with single quote, almost only \ and ' would need to be escaped. – FMaz008 Feb 10 '12 at 14:58
  • your right! PHP will be fine, but not the regexp for email check – Jorge Pinho Feb 10 '12 at 14:59
  • You have to escape FOR php, THEN for the regex, so you might have quadruple backslashes in some case, but the double quote just make the problem harder. – FMaz008 Feb 10 '12 at 15:01