2

I am trying to get a list of files from a directory thorough PHP. I also tried via glob, but doesn't work with HTTP, tried recursively and this is the latest script I managed to found. Just that it doesn't work. it doesn't display the files.

<?php
$url = 'removed for security puposes';
$html = file_get_contents($url);
$count = preg_match_all('/<td><a href="([^"]+)">[^<]*<\/a><\/td>/i', $html, $files);
for ($i = 0; $i < $count; ++$i) {
  echo "File: " . $files[1][$i] . "<br />\n";
}
var_dump($files);
?>

The var_dump($files); is output

array(2) { 
     [0]=> array(0) {
      } 
     [1]=> array(0) 
      { } 
} 

So what am I mistaking.

Adrian
  • 2,273
  • 4
  • 38
  • 78
  • **Don't use regular expressions to parse HTML. Use a proper HTML parsing module.** You cannot reliably parse HTML with regular expressions, and you will face sorrow and frustration down the road. As soon as the HTML changes from your expectations, your code will be broken. See http://htmlparsing.com/php or [this SO thread](http://stackoverflow.com/questions/3577641/how-do-you-parse-and-process-html-xml-in-php) for examples of how to properly parse HTML with PHP modules that have already been written, tested and debugged. – Andy Lester Apr 09 '14 at 14:20
  • @Jurik Well, don;t know where to look, I mean everywhere I look is only for the local path (scandir, glob....) – Adrian Apr 09 '14 at 14:20
  • 1
    @Jurik Then explain to him briefly – Poomrokc The 3years Apr 09 '14 at 14:20
  • 1
    Beside this lack of basic knowledge, I would suggest you to open `$url` in your browser and take a quick look at the source code. Hint: td != li – Jurik Apr 09 '14 at 14:20
  • Do you know that `$html` is coming back with actual HTML code that you're expecting? Did you try printing it out to make sure that your `file_get_contents()` does what you expect? – Andy Lester Apr 09 '14 at 14:21
  • Aside from trying to parse HTML using regex. There is no single `` in the target page. It's probably `
      ` and `
    • ` you're targeting.
    – haim770 Apr 09 '14 at 14:21
  • @AndyLester I agree. I tried this one and it must be link to a specific file like www.someweb.com/something.html but if it is beging redirect with .htaccess it returns nothing – Poomrokc The 3years Apr 09 '14 at 14:22
  • lol, I did a var_dump to html, and I can see the directory – Adrian Apr 09 '14 at 14:22
  • But you did not see that it is a list and not a table? ;) – Jurik Apr 09 '14 at 14:24
  • @Jurik Yep your right it was the li – Adrian Apr 09 '14 at 14:26
  • 1
    @PoomrokcThe3years I thought this is a Q&A portal and not a "I do not know what I am doing, please teach me" portal. Additionally I really appreciate his approach with regular expressions - it's always a good thing to do this from time to time and increase regexp skill :) – Jurik Apr 09 '14 at 14:27
  • Oh and no - I am not saying he does not know what he is doing. So my first comment was wrong - it was wrong as long as his script does not run on the same domain ;) – Jurik Apr 09 '14 at 14:28
  • @user3467855 - just a question, is this the same domain where your script is running or another server? – Jurik Apr 09 '14 at 14:36
  • @Jurik Another Server. – Adrian Apr 09 '14 at 17:43

3 Answers3

4

on your page are lists, not tables

   <?php
   $url = 'http://www.seoadsem.com/opencart';
   $html = file_get_contents($url);
   $count = preg_match_all('/<li><a href="([^"]+)">[^<]*<\/a><\/li>/i', $html, $files);
   for ($i = 0; $i < $count; ++$i) {
     echo "File: " . $files[1][$i] . "<br />\n";
   }
   var_dump($files);
   ?>
user3383116
  • 392
  • 1
  • 7
2

For security reasons, file_get_contents might not be working for URLs, only files. Please use cURL instead. This may save you a lot of debugging time.

See PHP cURL vs file_get_contents.

Community
  • 1
  • 1
Carsten Hellweg
  • 214
  • 1
  • 3
1
<?php
    $url = 'removed for security puposes';
    $html = file_get_contents($url);
    $count = preg_match_all('/<a href="([^"]+)(png|jpg|mp4|\/)">[^<]*<\/a>/i', $html, $files);
    for ($i = 0; $i < $count; ++$i) {
       echo "File: " . $files[1][$i] . $files[2][$i] . "<br />\n";
    }
    var_dump($files);
 ?>

png, jpg, mp4 can be replaced by extensions you need.