0

Just for fun, I'm trying to code something that will scan a website for YouTube URLs and save them. The URLs will not be within tags so I need to use regex. I have that part down. But how I do go about echoing the URLs from an array?

What I have so far:

<?php

$website = file_get_contents('http://boards.4chan.org/mu/res/41283979');
$reg_exURL = "/(?:https?://)?(?:www\.)?youtu(?:be\.com/watch\?(?:.*?&(?:amp;)?)?v=|\.be/)([\w‌​\-]+)(?:&(?:amp;)?[\w\?=]*)?/";

if(preg_match($reg_exURL, $website, $urls)) {
    // Echo all values in the array
    foreach ($urls as $url) {
        echo $url;
    }
} else {
    echo "No URLs Found.";
}

?>

But when I echo $url, I just get the word "Array". I want to see all the URLs, preferably one on each line.

Jason
  • 185
  • 1
  • 9

4 Answers4

1

Notice how 4chan adds <wbr> tags in the youtube IDs, probably as a security against things like this. You have to remove those tags from the source first with a replace.

Then you can use regex to match all links in the source, keeping in mind that a youtube video ID consists of letters, numbers, _, - and is always 11 characters long.

$website = str_replace("<wbr>","",file_get_contents('http://boards.4chan.org/mu/res/41283979'));

$regex = "/(https?:\/\/)?(?:www\.)?youtube\.com\/watch\?v=[A-Za-z0-9_-]{11}/";
preg_match_all($regex , $website, $urls, PREG_SET_ORDER); 

foreach ($urls as $url)
    echo $url[0] . "<br>";
dljve
  • 523
  • 3
  • 12
  • I had to replace your "
    " with a "\n", but this worked perfectly and is exactly what I wanted. It also seems a lot more compact than the other answers.
    – Jason Nov 06 '13 at 18:50
0

print_r is used to output arrays:

http://php.net/manual/en/function.print-r.php

scrowler
  • 24,273
  • 9
  • 60
  • 92
0

You can just use print_r($url) or var_dump($url). These are standard ways to print arrays.

Johnride
  • 8,476
  • 5
  • 29
  • 39
0

You can do this using a foreach loop.

<?php

$website = file_get_contents('http://boards.4chan.org/mu/res/41283979');
$reg_exURL = "/(?:https?:\/\/)?(?:www\.)?youtu(?:be\.com\/watch\?(?:.*?&(?:amp;)?)?v=|\.be\/)([\w‌​\-]+)(?:&(?:amp;)?[\w\?=]*)?/";
if(preg_match($reg_exURL, $website, $urls)) {
    // Echo all values in the array
    foreach ($urls as $url) {
        echo $url;
    }
} else {
    echo "No URLs Found.";
}

?>
blacktide
  • 10,654
  • 8
  • 33
  • 53
  • I updated the original post with your edits, but I'm still not getting anywhere. I added the URL I'm trying to scrape if it helps. – Jason Nov 05 '13 at 02:32
  • I didn't notice before, but it looks like there was an error in the regex. The forward slashes needed to be escaped. I've updated the answer with the working example. – blacktide Nov 05 '13 at 02:39
  • For some reason, the URL that is echoed is "http://www.youtube.com/watch?v=lHjNlHjN" when it should be echoing "http://www.youtube.com/watch?v=lHjNmyzrVvM". Also, do you know how to echo the rest of the URLs on the page? I'm sorry for sounding so ignorant. – Jason Nov 05 '13 at 02:44