0

Right now I am using the mediawiki api and requests module to attempt to pull certain information from a sort of table off of a wikipedia page. As an example, we will use the song Zombie where there is a 'table' on the right where it tells me the album, the author, the release date and so forth. The only issue I'm running into is that I don't know how to query this data as I'm using this link as the endpoint: https://en.wikipedia.org/w/api.php?format=json&formatversion=2&action=query&titles=Zombie_(song)&prop=extracts to attempt to search for what I need but it brings up the text on the page. I've tried the sandbox and I've had issues trying to find what would give me the information I need. I appreciate any advice and input, thanks.

Matthew Oujiri
  • 107
  • 1
  • 1
  • 8
  • See https://stackoverflow.com/questions/33862336/how-to-extract-information-from-a-wikipedia-infobox – Tgr May 31 '18 at 13:07

1 Answers1

0

For that sort of metadata you'd be best off using Wikidata. In the sidebar on Wikipedia there's a link to the Wikidata item, and you can use an API query such as https://www.wikidata.org/w/api.php?action=wbgetclaims&entity=Q169298 to get the data in a structured way. For information about what those results mean, see the Wikibase API docs.

[Edit:] To get the entity ID, you can use wbgetentities with a Wikipedia title (titles) and wiki ID (sites); e.g.: https://www.wikidata.org/w/api.php?action=wbgetentities&sites=enwiki&titles=Zombie_(song)

Sam Wilson
  • 4,402
  • 4
  • 29
  • 30