1

I'm trying to extract text between a given string and first non-alpha numeric character. The code below works but it's using the tag instead of \W.

$my_string = 'Auth code: 02452A</div>';
preg_match("~Auth code:(.*)</div>~",$my_string, $m);
print_r($m);
// shouldn't this work, too?
preg_match("~Auth code:(.*)\W~",$my_string, $m);
Banditvibe
  • 337
  • 3
  • 14
  • Not a duplicate of what you indicate @Wiktor. @Banditvibe, You can just add g flag : `preg_match("~Auth code:(.*)\W~g",$my_string, $m);` : see https://stackoverflow.com/questions/12993629/what-is-the-meaning-of-the-g-flag-in-regular-expressions – Pierre Granger Jul 26 '17 at 20:02
  • @PierreGranger: Ok, might not be a dupe. Still, it is PHP and not JS. `g` modifier is not supported, to get multiple matches in PHP, you need to use `preg_match_all`. I think `preg_match("~Auth code:\s*(.*?)\W~",$my_string, $m);` will work, but `"~Auth code:\s*\K\w+~"` is much better. See [**this IDEONE demo**](http://ideone.com/jVqRS5). – Wiktor Stribiżew Jul 26 '17 at 20:10
  • @PierreGranger Ok, preg_match_all for global – Banditvibe Jul 26 '17 at 20:12
  • @Banditvibe Is http://ideone.com/jVqRS5 what you need? – Wiktor Stribiżew Jul 26 '17 at 20:13
  • Well i can't explain why but `preg_match("~Auth code:(.*)\W~g",$my_string, $m);` works for me... `Array ( [0] => Auth code: 02452A [1] => 02452A )` – Pierre Granger Jul 26 '17 at 20:14
  • @PierreGranger: see [this demo](http://ideone.com/DbDeTJ): *PHP Warning: preg_match(): Unknown modifier 'g'* and `$m` is empty. – Wiktor Stribiżew Jul 26 '17 at 20:16
  • @wiktor-stribiżew yes, perfect. Like the \K – Banditvibe Jul 26 '17 at 20:17
  • Yeah i've seen the warning, its just funny how it works even with the warning. Probably more a bug than anyting, but still funny :) – Pierre Granger Jul 27 '17 at 06:46

1 Answers1

0

The Auth code:(.*)</div> pattern matches Auth code: literal substring, then matches and captures into Group 1 any 0+ chars other than line break chars, as many as possible as the * is a greedy quantifier, and then matches </div>, an obligatory literal substring.

If you replace .* with .*? (a lazy version), you still won't get the result you need because there is a space after :, and \W matches a space. So, .*? will match an empty string between : and the space.

The best way to get the substring you need is to add \s* (any 0+ whitespaces) after : and then use a match reset operator \K that omits the text matched so far, and match 1 or more word chars (it is much more efficient than match any chars lazily up to the first non-word char):

~Auth code:\s*\K\w+~

Details:

  • Auth code: - a literal substring
  • \s* - 0+ whitespaces
  • \K - a match reset operator
  • \w+ - 1 or more word chars

See the PHP demo online:

$my_string = 'Auth code: 02452A</div>';
preg_match("~Auth code:\s*\K\w+~",$my_string, $m);
print_r($m[0]); // => 02452A
Graham
  • 7,431
  • 18
  • 59
  • 84
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563