0

Possible Duplicate:
Robust, Mature HTML Parser for PHP
How to use wikipedia api if it exists?

I'm using YQL to get information from Wikipedia and store it in my private Database. For example I'm scraping this page. I need all the film names from the page. I'm using this code:

HTML:

$.YQL("select * from html where url='http://en.wikipedia.org/wiki/Rajinikanth_filmography' and xpath='/html/body/div[3]/div[3]/div[4]/table'", function (data) {
            var str = data.query.results.table.tr;
            console.log(str);
            $.ajax({
                type: "POST",
                url: "db.php",
                data: {
                    sendingStr: str
                },
                success: function(data){
                    console.log(data);
                }
            });
        });

PHP:

$recv = $_POST['sendingStr'];
$arraySize = count($recv);
for ($i=1; $i < $arraySize; $i++) {
    foreach ($recv[$i]["td"][1] as $value) {
        foreach ($value as $val) {
            if(strlen($val["content"]) >= 3)
            {

                echo $val["content"] . "\n";

            }

        }
    }
}

Here is my problem- If you notice in the page, each row in the table has several rowspans. But when I scrap it, I'm getting only first value from each row. What should I change in my code so that I get all values?

Community
  • 1
  • 1
ajyvardan
  • 105
  • 10
  • 4
    why do you scrape Wikipedia when you can use their API? – Gordon Sep 28 '12 at 13:38
  • why do you use YQL when you can parse HTML with PHP easily? – hakre Sep 28 '12 at 13:39
  • I'm more comfortable with YQL than HTML parsing and I don't have the time to read their documentation and master their API. I've got the whole code but i'm stuck with this. Can anyone help me? – ajyvardan Sep 28 '12 at 13:44
  • Have a look at [DbPedia](http://dbpedia.org/About) - they do all the [complex] scraping for you and present you structured data – Bergi Sep 28 '12 at 13:59

0 Answers0