2

I am trying to extract the info in the grey box (summary/info box) for a link such as http://en.wikipedia.org/wiki/DressBarn (info in grey box / right column such as type, etc).

I am using this http://en.wikipedia.org/w/api.php?action=query&prop=extracts|info&exintro&titles=DressBarn&format=json&redirects&inprop=url&indexpageids -- and it only returns the summary.

I tried experimenting with the sandbox but was not able to figure how to extract info specifically contained in the grey box.

ROMANIA_engineer
  • 54,432
  • 29
  • 203
  • 199
ChicagoDude
  • 591
  • 7
  • 21
  • Have a look at http://dbpedia.org, e. g. http://live.dbpedia.org/page/DressBarn. – svick Feb 01 '14 at 21:55
  • Possible duplicate of [Getting the infobox section of wikipedia](http://stackoverflow.com/q/3312346) (or possibly [content of infobox of wikipedia](http://stackoverflow.com/q/8088226) or [mediawiki api: how to get infobox from a wikipedia article](http://stackoverflow.com/q/7638402) or [Get all Wikipedia Infobox Templates and all Pages using them](http://stackoverflow.com/q/8000211) or [others](http://stackoverflow.com/search?q=wikipedia+infobox)...) – Ilmari Karonen Feb 08 '14 at 15:10
  • 1
    I see all these duplicate questions, but **all** of them are simply link-only answers to DBPedia. I voted to leave this open because I think it would be much better to at least have some example code of how this particular question would be answered with DBPedia exactly. – Joeytje50 Feb 08 '14 at 17:23

1 Answers1

1

You can use PHP Simple HTML DOM Parser.

<?php
//The folder where you uploaded simple_html_dom.php
require_once('/homepages/0/d502303335/htdocs/js/simple_html_dom.php');

//Wikipedia page to parse
$html = file_get_html('https://en.wikipedia.org/wiki/Burger_King');

foreach ( $html->find ( 'table[class=infobox vcard]' ) as $element ) {

    $cells = $element->find('td');

    $i = 0;

    foreach($cells as $cell) {

        $left[$i] = $cell->plaintext;

        if (!(empty($left[$i]))) {

            $i = $i + 1;

        }

    }

    $cells = $element->find('th');

    $i = 0;

    foreach($cells as $cell) {

        $right[$i] = $cell->plaintext;

        if (!(empty($right[$i]))) {

            $i = $i + 1;

        }

    }

    print_r ($right);

    echo "<br><br><br>";

    print_r ($left);

    //If you want to know what kind of industry burger king is
    echo "Burger king is $right[2], $left[2]

}

?>

If this answer suit your needs please choose it as best answer and upvote it because it took me a lot of effort.

ROMANIA_engineer
  • 54,432
  • 29
  • 203
  • 199
Giacomo Pigani
  • 2,256
  • 27
  • 36