0

I'm looking to be able to extract a Twitter username from a URL.

E.g: https://twitter.com/jack => jack

I found this Regex to be helpful.

  if (preg_match("/^https?:\/\/(www\.)?twitter\.com\/(#!\/)?(?<name>[^\/]+)(\/\w+)*$/", $url, $regs)) {
    return $regs['name'];
  }

It doesn't seem to work when twitter URL contains query parameters.

For example = https://twitter.com/jack?lang=en returns jack?lang=en

Any idea how to improve the regex to prevent this ?

0x0
  • 363
  • 1
  • 4
  • 19

1 Answers1

1
preg_match('/https?:\/\/twitter\.com\/(?<name>[^\?]+)\??.*/', 'https://twitter.com/jack?lang=en', $m);
var_dump(trim($m['name']));
$path = parse_url('https://twitter.com/jack?lang=en',PHP_URL_PATH);
var_dump(str_replace('/','', $path));
string(4) "jack"
BambinoUA
  • 6,126
  • 5
  • 35
  • 51
  • Can you add an explanation about the pattern `https?:\/\/twitter\.com\/(?[^\?]+)?\??.*`? This part `(?[^\?]+)?` is optional. This is also optional `\??` and this `.*` matches any char 0+ times. So all after `https://twitter.com/` is optional and the `.*` will match the whole line. – The fourth bird Nov 21 '20 at 12:12
  • It seems I lost in questions. I fixed the regexp. Name part, of course, not optional. The query question mark is optional and all after it also optional. So regexp means: scheme, host, then all until question mark is name, then literally "?" which may be absent and any symbols (or no one) after "?" – BambinoUA Nov 21 '20 at 22:26