How to use regex to grab URLs and then echo URLs from website?

Question

Just for fun, I'm trying to code something that will scan a website for YouTube URLs and save them. The URLs will not be within tags so I need to use regex. I have that part down. But how I do go about echoing the URLs from an array?

What I have so far:

<?php

$website = file_get_contents('http://boards.4chan.org/mu/res/41283979');
$reg_exURL = "/(?:https?://)?(?:www\.)?youtu(?:be\.com/watch\?(?:.*?&(?:amp;)?)?v=|\.be/)([\w‌\-]+)(?:&(?:amp;)?[\w\?=]*)?/";

if(preg_match($reg_exURL, $website, $urls)) {
    // Echo all values in the array
    foreach ($urls as $url) {
        echo $url;
    }
} else {
    echo "No URLs Found.";
}

?>

But when I echo $url, I just get the word "Array". I want to see all the URLs, preferably one on each line.

possible duplicate of [How to get all urls from page (php)](http://stackoverflow.com/questions/1128774/how-to-get-all-urls-from-page-php) — Andy Lester, Nov 05 '13 at 02:33
No it isn't because those answers rely on tags. I can't rely on such tags. I need a regex answer. — Jason, Nov 05 '13 at 02:35

score 1 · Accepted Answer · answered Nov 05 '13 at 05:05

Notice how 4chan adds <wbr> tags in the youtube IDs, probably as a security against things like this. You have to remove those tags from the source first with a replace.

Then you can use regex to match all links in the source, keeping in mind that a youtube video ID consists of letters, numbers, _, - and is always 11 characters long.

$website = str_replace("<wbr>","",file_get_contents('http://boards.4chan.org/mu/res/41283979'));

$regex = "/(https?:\/\/)?(?:www\.)?youtube\.com\/watch\?v=[A-Za-z0-9_-]{11}/";
preg_match_all($regex , $website, $urls, PREG_SET_ORDER); 

foreach ($urls as $url)
    echo $url[0] . "<br>";

I had to replace your "
" with a "\n", but this worked perfectly and is exactly what I wanted. It also seems a lot more compact than the other answers. — Jason, Nov 06 '13 at 18:50

score 0 · Answer 2 · answered Nov 05 '13 at 02:20

0

print_r is used to output arrays:

http://php.net/manual/en/function.print-r.php

answered Nov 05 '13 at 02:20

scrowler

24,273
9
60
92

Using "print_r($url);" doesn't get any of the links I want. It seems to grab the very first link in an tag. – Jason Nov 05 '13 at 02:23
correct - `$url` contains an array, so `echo` returns 'Array' – scrowler Nov 05 '13 at 02:25
Update your regex, have a look at http://stackoverflow.com/questions/3717115/regular-expression-for-youtube-links – scrowler Nov 05 '13 at 02:26
Thanks, that will come in handy for the future, but it doesn't solve my issue. – Jason Nov 05 '13 at 02:37

score 0 · Answer 3 · answered Nov 05 '13 at 02:20

0

You can just use print_r($url) or var_dump($url). These are standard ways to print arrays.

answered Nov 05 '13 at 02:20

Johnride

8,476
5
29
39

blacktide · Answer 4 · 2013-11-05T02:39:26.550

0

You can do this using a foreach loop.

<?php

$website = file_get_contents('http://boards.4chan.org/mu/res/41283979');
$reg_exURL = "/(?:https?:\/\/)?(?:www\.)?youtu(?:be\.com\/watch\?(?:.*?&(?:amp;)?)?v=|\.be\/)([\w‌\-]+)(?:&(?:amp;)?[\w\?=]*)?/";
if(preg_match($reg_exURL, $website, $urls)) {
    // Echo all values in the array
    foreach ($urls as $url) {
        echo $url;
    }
} else {
    echo "No URLs Found.";
}

?>

edited Nov 05 '13 at 02:39

answered Nov 05 '13 at 02:22

blacktide

10,654
8
33
53

I updated the original post with your edits, but I'm still not getting anywhere. I added the URL I'm trying to scrape if it helps. – Jason Nov 05 '13 at 02:32
I didn't notice before, but it looks like there was an error in the regex. The forward slashes needed to be escaped. I've updated the answer with the working example. – blacktide Nov 05 '13 at 02:39
For some reason, the URL that is echoed is "http://www.youtube.com/watch?v=lHjNlHjN" when it should be echoing "http://www.youtube.com/watch?v=lHjNmyzrVvM". Also, do you know how to echo the rest of the URLs on the page? I'm sorry for sounding so ignorant. – Jason Nov 05 '13 at 02:44

How to use regex to grab URLs and then echo URLs from website?

4 Answers4