Right now I am using the mediawiki api and requests module to attempt to pull certain information from a sort of table off of a wikipedia page. As an example, we will use the song Zombie where there is a 'table' on the right where it tells me the album, the author, the release date and so forth. The only issue I'm running into is that I don't know how to query this data as I'm using this link as the endpoint: https://en.wikipedia.org/w/api.php?format=json&formatversion=2&action=query&titles=Zombie_(song)&prop=extracts to attempt to search for what I need but it brings up the text on the page. I've tried the sandbox and I've had issues trying to find what would give me the information I need. I appreciate any advice and input, thanks.
Asked
Active
Viewed 197 times
0
-
See https://stackoverflow.com/questions/33862336/how-to-extract-information-from-a-wikipedia-infobox – Tgr May 31 '18 at 13:07
1 Answers
0
For that sort of metadata you'd be best off using Wikidata. In the sidebar on Wikipedia there's a link to the Wikidata item, and you can use an API query such as https://www.wikidata.org/w/api.php?action=wbgetclaims&entity=Q169298 to get the data in a structured way. For information about what those results mean, see the Wikibase API docs.
[Edit:] To get the entity ID, you can use wbgetentities
with a Wikipedia title (titles
) and wiki ID (sites
); e.g.: https://www.wikidata.org/w/api.php?action=wbgetentities&sites=enwiki&titles=Zombie_(song)

Sam Wilson
- 4,402
- 4
- 29
- 30
-
-
So, from here, how can I read through this for information from the infobox? – Matthew Oujiri May 31 '18 at 20:57
-
Check out the question that @Tgr linked to above, it's got more info. – Sam Wilson May 31 '18 at 22:59