Scraping information from wikipedia

Question

Possible Duplicate:
Robust, Mature HTML Parser for PHP
How to use wikipedia api if it exists?

I'm using YQL to get information from Wikipedia and store it in my private Database. For example I'm scraping this page. I need all the film names from the page. I'm using this code:

HTML:

$.YQL("select * from html where url='http://en.wikipedia.org/wiki/Rajinikanth_filmography' and xpath='/html/body/div[3]/div[3]/div[4]/table'", function (data) {
            var str = data.query.results.table.tr;
            console.log(str);
            $.ajax({
                type: "POST",
                url: "db.php",
                data: {
                    sendingStr: str
                },
                success: function(data){
                    console.log(data);
                }
            });
        });

PHP:

$recv = $_POST['sendingStr'];
$arraySize = count($recv);
for ($i=1; $i < $arraySize; $i++) {
    foreach ($recv[$i]["td"][1] as $value) {
        foreach ($value as $val) {
            if(strlen($val["content"]) >= 3)
            {

                echo $val["content"] . "\n";

            }

        }
    }
}

Here is my problem- If you notice in the page, each row in the table has several rowspans. But when I scrap it, I'm getting only first value from each row. What should I change in my code so that I get all values?

I'm more comfortable with YQL than HTML parsing and I don't have the time to read their documentation and master their API. I've got the whole code but i'm stuck with this. Can anyone help me? — ajyvardan, Sep 28 '12 at 13:44
Have a look at [DbPedia](http://dbpedia.org/About) - they do all the [complex] scraping for you and present you structured data — Bergi, Sep 28 '12 at 13:59

Scraping information from wikipedia

0 Answers0