0

I am trying to grab Infobox from wikipedia api using Regex in PHP. Iam getting some unnecessary informations also with the required info. How can i control it? I tried to get only the infobox class, but i guess its not correct. Anyone has worked on with grabbing the infobox contents? Is there any alternate solution other than Regex? Can anyone pls help me on this?

$url = "http://en.wikipedia.org/w/api.php?action=query&prop=revisions&rvprop=content&format=json&titles=Haw_Par_Villa&rvsection=0&rvparse";
$data = json_decode(file_get_contents($url), true);
$data = current($data['query']['pages']);
$regex = '#<\s*?table\b[^>]*>(.*)</table\b[^>]*>#s';
//$regex = '(?=\{Infobox)(\{([^{}]|(?1))*\})';
$code = preg_match($regex, $data["revisions"][0]['*'], $matches);
echo($matches[0]);

0 Answers0