0

I am trying to download a file in php.

$file = file_get_contents($url);

How should i download the contents of the links within the file in $url...

Andy Lester
  • 91,102
  • 13
  • 100
  • 152
El Classico
  • 25
  • 1
  • 3
  • 8
  • Download links by calling file_get_contents, passing the link as an argument. – Oswald Jan 06 '11 at 15:25
  • possible duplicate of [Best Methods to parse HTML](http://stackoverflow.com/questions/3577641/best-methods-to-parse-html/3577662#3577662) – Gordon Jan 06 '11 at 15:27

3 Answers3

2

This requires parsing HTML, which is quite a challenge in PHP. To save you a lot of trouble, download an HTML parsing library, such as PHPQuery (http://code.google.com/p/phpquery/). Then you'll have to select all the links with pq('a'), loop through them getting their href attribute values, and for each one, convert it from relative to absolute and run a file_get_contents on the resulting URL. Hopefully these pointers should get you started.

Nathan MacInnes
  • 11,033
  • 4
  • 35
  • 50
1

So you want to find all URLs in a given file? RegEx to the rescue... and some sample code below which should do what you want:

$file = file_get_contents($url);
if (!$file) return;
$file = addslashes($file);

//extract the hyperlinks from the file via regex
preg_match_all("/http:\/\/[A-Z0-9_\-\.\/\?\#\=\&]*/i", $file, $urlmatches);

//if there are any URLs to be found
if (count($urlmatches)) {
    $urlmatches = $urlmatches[0];
    //count number of URLs
    $numberofmatches = count($matches);
    echo "Found $numberofmatches URLs in $url\n";

    //write all found URLs line by line
    foreach($urlmatches as $urlmatch) {
        echo "URL: $urlmatch...\n";
    }
}

EDIT: When I understand your question correctly, you now want to download the contents of the found URLs. You would do that in the foreach loop calling file_get_contents for each URL, but you probably want to do some filtering beforehand (like don't download images etc.).

Dennis G
  • 21,405
  • 19
  • 96
  • 133
0

You'll need to parse the resulting HTML string, either manually, or via a 3rd party plugin.

HTML Scraping in Php

Community
  • 1
  • 1
Dutchie432
  • 28,798
  • 20
  • 92
  • 109