1

I want extract from the text youtube url string like https://www.youtube.com/watch?time_continue=218&v=0EB7zh_7UE4 and the video id like 0EB7zh_7UE4 so I can inject text behind the string based on video id. This is my sample text:

This is an example page will show up https://www.youtube.com/watch?time_continue=218&v=0EB7zh_7UE4 Bike https://www.youtube.com/watch?v=0EB7zh_7UE4&feature=youtu.be&app=desktop messenger by day, aspiring actor by night, and this is my website. I live in https://youtu.be/1EB7zh_7UE4 Los Angeles, have a great dog named Jack, and I https://www.youtube.com/watch?v=0EB7zh_7UE4&feature=youtu.be like piña coladasdoohickeys https://www.youtube.com/watch?v=4EB7zh_7UE4 you should go to <a href="http://example.com/wp-admin/">your dashboard</a> to delete this page and create new pages for your content. Have fun!

https://www.youtube.com/watch?v=0EB7zh_7UE4

more

https://www.youtube.com/watch?v=2EB7zh_7UE4&feature=youtu.be

That\'s all..

This is regex I got so far but errors are as follows:

  • it adds (here) string before end of link string (in the middle). I want to add (here) at the end you Youtube url link string

  • it returns multiple here injection

See code:

function regex($sample_text) {
    if (preg_match_all('#(?:https?:\/\/)?(?:m\.|www\.)?(?:youtu\.be\/|youtube\-nocookie\.com\/embed\/|youtube\.com\/(?:embed\/|v\/|e\/|\?v=|shared\?ci=|watch\?v=|watch\?.+&v=))([-_A-Za-z0-9]{10}[AEIMQUYcgkosw048])(.*?)\b#s', $sample_text, $matches, PREG_SET_ORDER)) {
        print_r($matches);
        foreach ($matches as $match) {
            $add = ' (here)';
            $processed_text = str_replace($match[0], $match[0] . $add, $sample_text);
        }
    }
    return $processed_text;
}
echo regex($sample_test);

Where I do mistake?

Note: question + sample text have been updated.

cristalix
  • 93
  • 1
  • 6
  • What do you mean by "it duplicates ID value"? What is the expected output, and the output you see? – IMSoP Feb 19 '18 at 16:28
  • @Syscall It's not generic code but edited for stackoberflow.com. I missed it. Question was fixed. – cristalix Feb 19 '18 at 16:33
  • $processed_text is being reset from the $sample_text each time, not a running replacement for each value. – Jamie - Fenrir Digital Ltd Feb 19 '18 at 16:35
  • @IMSoP I want to inject ` (here)` at the end you Youtube url link string, not it adds it in the middle. – cristalix Feb 19 '18 at 16:36
  • @EvilGeniusJamie Yes! That's the part of mistakes. That's why I returns multile injection. So bright! Thank you. – cristalix Feb 19 '18 at 16:37
  • @EvilGeniusJamie but multiple str_replace is partially result of wrong regex – cristalix Feb 19 '18 at 16:51
  • So all you want is to retrieve ID from any string containing youtube link? – Richard Feb 19 '18 at 17:00
  • Is this really **text** or some **html source** - if the latter is the case, you could use some xpath queries instead. – Jan Feb 19 '18 at 18:40
  • @Richard I want extract from the text youtube url string `https://www.youtube.com/watch?time_continue=218&v=0EB7zh_7UE4` and the video id `0EB7zh_7UE4` so I can inject text behind the string based on video id. – cristalix Feb 20 '18 at 08:45

5 Answers5

1

To expand on my comment, you're replacing the result text each time with the original string, $sample_text. This is a simple fix, just initialise $processed_text at the start, and work on that.

function regex($sample_text) {
    $processed_text = $sample_text;
    if (preg_match_all('#(?:https?:\/\/)?(?:m\.|www\.)?(?:youtu\.be\/|youtube\-nocookie\.com\/embed\/|youtube\.com\/(?:embed\/|v\/|e\/|\?v=|shared\?ci=|watch\?v=|watch\?.+&v=))([-_A-Za-z0-9]{10}[AEIMQUYcgkosw048])(.*?)\b#s', $sample_text, $matches, PREG_SET_ORDER)) {
        print_r($matches);
        foreach ($matches as $match) {
            $add = ' (here)';
            $processed_text = str_replace($match[0], $match[0] . $add, $processed_text);
        }
    }
    return $processed_text;
}
echo regex($sample_test);

Your regex is also not matching to the end of the URL. For the purposes of the sample text you provided, you could match up to anything that isn't whitespace:

'#(?:https?:\/\/)?(?:m\.|www\.)?(?:youtu\.be\/|youtube\-nocookie\.com\/embed\/|youtube\.com\/(?:embed\/|v\/|e\/|\?v=|shared\?ci=|watch\?v=|watch\?.+&v=))([-_A-Za-z0-9]{10}[AEIMQUYcgkosw048])\S*#s'

However this won't match characters like " or ., but you could add those in as an | in a group. You seem to have a pretty good grasp of regex, so I'll assume you can work this out - if not, comment and I'll update my answer.


For completeness sake, I've included the completed code with my regex:

function regex($sample_text) {
    $processed_text = $sample_text;
    if (preg_match_all('#(?:https?:\/\/)?(?:m\.|www\.)?(?:youtu\.be\/|youtube\-nocookie\.com\/embed\/|youtube\.com\/(?:embed\/|v\/|e\/|\?v=|shared\?ci=|watch\?v=|watch\?.+&v=))([-_A-Za-z0-9]{10}[AEIMQUYcgkosw048])\S*#s', $sample_text, $matches, PREG_SET_ORDER)) {
        print_r($matches);
        foreach ($matches as $match) {
            $add = ' (here)';
            $processed_text = str_replace($match[0], $match[0] . $add, $processed_text);
        }
    }
    return $processed_text;
}
echo regex($sample_test);
1
<?php

$str = 'This is an example page will show up https://www.youtube.com/watch?time_continue=218&v=0EB7zh_7UE4 Bike https://www.youtube.com/watch?v=1EB7zh_7UE4&feature=youtu.be&app=desktop messenger by day, aspiring actor by night, and this is my website. I live in https://youtu.be/2EB7zh_7UE4 Los Angeles, have a great dog named Jack, and I https://www.youtube.com/watch?v=3EB7zh_7UE4&feature=youtu.be like piña coladasdoohickeys https://www.youtube.com/watch?v=4EB7zh_7UE4 you should go to <a href="http://example.com/wp-admin/">your dashboard</a> to delete this page and create new pages for your content. Have fun!

https://www.youtube.com/watch?v=5EB7zh_7UE4

more

https://www.youtube.com/watch?v=6EB7zh_7UE4&feature=youtu.be

That\'s all.';

preg_match_all('#\bhttps?://[^\s()<>]+(?:\([\w\d]+\)|([^[:punct:]\s]|/))#', $str, $match);

// youtube vid ID array placeholder
$youtubeVids = array();

// Going through each URL and retrieving the video ID
foreach($match[0] as $url)
{
    // Parsing the URL
    $url = parse_url($url);

    // Retrieving the query if they exist
    if(isset($url['query']))
    {
        parse_str($url['query'], $yt_vid);
    }

    // Checking if we have the query parts
    if(isset($yt_vid['v']))
    {
        // Adding the vid ID to the vid ID list
        $youtubeVids[] = $yt_vid['v'];
    }
    else
    {
        // No queries, checking if we are checking a youtube vid (maybe regex better?)
        if(stripos($url['host'], 'youtu') !== false)
        {
            // Adding the ID to ID list (This is mainly for links like youtube.com/6EB7zh_7UE4 or youtu.be/6EB7zh_7UE4)
            $youtubeVids[] = substr($url['path'], 1);
        }
    }

    // Unsetting so it won't be set in the next loop
    unset($yt_vid);
}

print_r($youtubeVids);
?>

Outputs

Array
(
    [0] => 0EB7zh_7UE4
    [1] => 1EB7zh_7UE4
    [2] => 2EB7zh_7UE4
    [3] => 3EB7zh_7UE4
    [4] => 4EB7zh_7UE4
    [5] => 5EB7zh_7UE4
    [6] => 6EB7zh_7UE4
)

I found the following solution on the net though.

preg_match_all('/(?:youtube(?:-nocookie)?\.com\/(?:[^\/\n\s]+\/\S+\/|(?:v|e(?:mbed)?)\/|\S*?[?&]v=)|youtu\.be\/)([a-zA-Z0-9_-]{11})\W/', $str, $match);
print_r($match);
ezw
  • 338
  • 2
  • 12
  • You did it dirty way, but it works! Will keep the question open for a while if somebody post better solution, but you are the best for now. – cristalix Feb 19 '18 at 17:02
0

You could use

https?://\S+?\Qyoutube.com\E\S+?v=\K[^&\s]+

See a demo on regex101.com.

Jan
  • 42,290
  • 8
  • 54
  • 79
0

Just for a record, I have end up with this "simple" function based on this:

function filter($content) {
return preg_replace_callback('#(?:https?:\/\/)?(?:m\.|www\.)?(?:youtu\.be\/|youtube\-nocookie\.com\/embed\/|youtube\.com\/(?:embed\/|v\/|e\/|\?v=|shared\?ci=|watch\?v=|watch\?.+&v=))([-_A-Za-z0-9]{10}[AEIMQUYcgkosw048])\S*#s', function($match) {
    return sprintf('%s my replace with 2nd parameter found %s', $match[0], $match[1]);
}, $content);    
}
cristalix
  • 93
  • 1
  • 6
0

This is what has been working for me:

function FindYouTubeId($url)
{
preg_match('%(?:youtube(?:-nocookie)?\.com/(?:[^/]+/.+/|(?:v|e(?:mbed)?)/|.*[?&]v=)|youtu\.be/)([^"&?/ ]{11})%i', $url, $match);
$youtube_id = $match[1];
return $youtube_id;
}
Debbie Kurth
  • 403
  • 3
  • 16