0

Hi know there's already a lot of question about this so far. But I've tried a lot of them and can't get it quite where I need it.

I need a regex that will extract a youtube url from a string that contains an iframe.

Sample text:

<p>

</p><p>Garbage text</p><p><iframe width="560" height="315" src="//www.youtube.com/embed/PZlJFGgFTfA" frameborder="0" allowfullscreen=""></iframe></p>

Here's the regex I come up with :

(\bhttps?:)?\/\/[^,\s()<>]+(?:\([\w\d]+\)|(?:[^,[:punct:]\s]|\/))

Regex101 test

I'm using it on a function and it returned an empty array. Do someone have an idea what's wrong with my function ?

function extractEmbedYT($str) {
    preg_match('/(\bhttps?:)?\/\/[^,\s()<>]+(?:\([\w\d]+\)|(?:[^,[:punct:]\s]|\/))/', $str, $matches, PREG_OFFSET_CAPTURE, 0);
    return $matches;
}

EDIT 1 : Changed capture group in my regex so it don't capture last the last char

EDIT 2 : Added some PHP Code to put in context, since it's working in Regex101 but not on my script.

Jonathan Lafleur
  • 493
  • 5
  • 25
  • If you do not need that capture, why use a capturing group? Replace `([^,[:punct:]\s]|\/)` with a non-capturing group - [`(?:[^,[:punct:]\s]|\/)`](https://regex101.com/r/vY6eV7/46). – Wiktor Stribiżew Sep 12 '18 at 13:53
  • Hum, that work in Regex101, but not on my function... I don't understand why exactly... Can you take a look here https://3v4l.org/fmG1W ? I'll update my question accordingly and if you can post an answer will accept it as well. – Jonathan Lafleur Sep 12 '18 at 14:09
  • In your code, you have `$string` variable but you pass `$str` to the `extractEmbedYT` function - see https://3v4l.org/1OFDf – Wiktor Stribiżew Sep 12 '18 at 14:17
  • OMG... Wow, thank you Wiktor ! I havn't spotted it – Jonathan Lafleur Sep 12 '18 at 14:19

1 Answers1

2

You need to convert the capturing group to a non-capturing one:

/(\bhttps?:)?\/\/[^,\s()<>]+(?:\(\w+\)|(?:[^,[:punct:]\s]|\/))/s
                                       ^^^

Also, in the code, you need to pass $string to the function, not $str:

function stripEmptyTags ($result)
{
    $regexps = array (
        '~<(\w+)\b[^\>]*>([\s]|&nbsp;)*</\\1>~',
        '~<\w+\s*/>~',
    );

    do
    {
        $string = $result;
        $result = preg_replace ($regexps, '', $string);
    }
    while ($result != $string);

    return $result;
}


function extractEmbedYT($str) {
    // Find all URLS in $str

    preg_match_all('/(\bhttps?:)?\/\/[^,\s()<>]+(?:\(\w+\)|(?:[^,[:punct:]\s]|\/))/s', $str, $matches);

    // Remove all iframes from $str
    $str = preg_replace('/<iframe.*?<\/iframe>/i','', $str);


    $str = stripEmptyTags($str);
    return [$str, $matches[0]];
}

$string = '<p>

</p><p>UDA Stagiaire</p><p><iframe width="560" height="315" src="//www.youtube.com/embed/PZlJFGgFTfA" frameborder="0" allowfullscreen=""></iframe></p>';

$results = extractEmbedYT($string);

print_r($results);

See the online PHP demo.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563