0

A client of mine has asked for me to create a simple site that monitors files on another site. He needs to monitor the file names (unsure why?) and have them outputted to a file.

Here's the example source; http://pastebin.com/tyLUmCJr

I don't speak Russian, so I'm unaware of what the site's about. I apologize if it's anything that's 'less-than-suitable'.

Anyway, if you scroll to line 117, you will see a file name. I need to get all of the file names.

I've played around with the DOMDocument and third-party tools although I believe I could use regex to increase the speed of this. If anybody could point me in the correct direction, it would be greatly appreciated.

Note: take in mind that the source is stored within a string-variable known as $content.

Cheers!

Profile
  • 45
  • 9

1 Answers1

0

After some more detailed, extensive research, I found a way to do it. Here's how I achieved it;

<?php
    require_once("phpQuery.php");
    $min = isset($_GET['min']) ? $_GET['min'] : 1;
    $max = isset($_GET['max']) ? $_GET['max'] : 2; 
    $pages = [];
    foreach(range($min, $max) as $page) {
        array_push($pages, iconv("CP1251", "UTF-8", file_get_contents("http://www.fayloobmennik.net/files/list/" . $page . ".html")));
    }  
    $html = file_get_html("http://www.fayloobmennik.net/files/list/");
    $elem = $html->find('div[id=info] table > tbody', 0);
    $test = $elem->find('tr a');
    foreach ($test as $test2) {
        $regex = '/<a href=\"([^\"]*)\">(.*)<\/a>/iU';
        $test2 = preg_match($regex, $test2, $match);
        print_r(iconv("CP1251", "UTF-8", $match[2]));
        echo "<br/>";
    }
?>

The phpQuery.php class is simple_html_dom (I believe that's what it's called?).

Cheers.

Profile
  • 45
  • 9