-2

I have a big problem. I want to parse a web page using php. And I don't understand why it doesn't work. I want to take the "tr" tags from that page, and then, I'll parse each text obtain previously, by the "td" tags. The thing is that I can't parse the text so between two tags can have another two.

Is there any trick about wich I should know? Beacuse I'm trying this for over 2 days and I still can't get a result.

This is the page:

http://www.tjareborg.fi/akkilahdot?DepartureIds=-1&CtryId=-1&DestinationAirportIds=-1&ResId=-1&QueryDurID=a&QueryDepDate=10.6.2011&LmsTypeId=2%2c3%2c1&PaxPrice=2167&SortAscending=True&page=0

All I want to do is parse that table, and get the content of every cell.

Thank you so much!!!

Nick Fortescue
  • 43,045
  • 26
  • 106
  • 134
Gigg
  • 1,019
  • 3
  • 11
  • 20
  • *(related)* [Best methods to parse HTML](http://stackoverflow.com/questions/3577641/best-methods-to-parse-html/3577662#3577662) – Gordon Jun 10 '11 at 09:46
  • 1
    You might want to point out what you have already tried and show us some code. StackOverflow has many examples how to parse HTML and right now your question gets across like gimme-teh-codez. – Gordon Jun 10 '11 at 09:54
  • *(related)* [Robust and Mature HTML parser for PHP](http://stackoverflow.com/questions/292926/robust-mature-html-parser-for-php) – Gordon Jun 10 '11 at 09:57

2 Answers2

2

Try:

libxml_use_internal_errors(true);

$url = '%your url%';
$dom = new DOMDocument;
$dom->loadHTML(file_get_contents($url));

libxml_clear_errors();

$xpath = new DOMXPath($dom);
$rows = array();
foreach ($xpath->query('//*[@id="tblLmsList"]//tr') as $tr) {
    $cells = array();
    foreach ($xpath->query('td', $tr) as $td) {
        $cells[] = trim($td->nodeValue);
    }

    if (sizeof($cells) > 0) {
        $rows[] = $cells;
    }
}

print_r($rows);

Output:

Array
(
    [0] => Array
        (
            [0] => la 11.6.
            [1] => Varna
                Bulgaria
            [2] => Helsinki
            [3] => Matkajokeri
            [4] => 175,-
            [5] => 
            [6] => -
            [7] => 
            [8] => -
            [9] => 
        )

    [1] => Array
        (
            [0] => la 11.6.
            [1] => Varna
                Bulgaria
            [2] => Helsinki
            [3] => Pelkät lennot
            [4] => 150,-
            [5] => 
            [6] => -
            [7] => 
            [8] => -
            [9] => 
        )

...
Yoshi
  • 54,081
  • 14
  • 89
  • 103
  • dont use error suppression. use [`libxml_use_internal_errors`](http://nl2.php.net/manual/en/function.libxml-use-internal-errors.php) and [`libxml_clear_errors`](http://nl2.php.net/manual/en/function.libxml-clear-errors.php) – Gordon Jun 10 '11 at 09:52
  • That works!! Thank you so much. You saved me! I'll start learning more about DOMDocument's. It seems it works in this case. – Gigg Jun 10 '11 at 10:04
1

Try having a look at http://simplehtmldom.sourceforge.net/

Nick Fortescue
  • 43,045
  • 26
  • 106
  • 134
  • Besides being hardly an answer because it doesnt show the OP how to achive his goal, SimpleHTMLDom is a poor choice for a parser. It's slow, has a crappy codebase and is not based on libxml. See my link below the question for better alternatives to SimpleHtmlDom. – Gordon Jun 10 '11 at 09:48