How can I get the Infobox from a Wikipedia article by the MediaWiki API?

Question

Wikipedia articles may have Infobox templates. By the following call I can get the first section of an article which includes an Infobox.

http://en.wikipedia.org/w/api.php?action=parse&pageid=568801&section=0&prop=wikitext

I want a query which will return only Infobox data. Is this possible?

See [How to extract information from a Wikipedia infobox?](https://stackoverflow.com/questions/33862336/how-to-extract-information-from-a-wikipedia-infobox) for a more detailed answer. — Tgr, Jul 19 '17 at 22:33

score 38 · Answer 1 · edited Oct 18 '20 at 19:16

38

You can do it with a URL call to the Wikipedia API like this:

http://en.wikipedia.org/w/api.php?action=query&prop=revisions&rvprop=content&format=xmlfm&titles=Scary%20Monsters%20and%20Nice%20Sprites&rvsection=0

Replace the titles= section with your page title, and format=xmlfm to format=json if you want the article in JSON format.

edited Oct 18 '20 at 19:16

Peter Mortensen

30,738
21
105
131

answered Dec 12 '12 at 21:06

Gaʀʀʏ

4,372
3
39
59

score 17 · Answer 2 · edited Oct 25 '18 at 09:14

17

Instead of parsing infoboxes yourself, which is quite complicated, take a look at DBPedia, which has Wikipedia infoboxes extracted out as database objects.

edited Oct 25 '18 at 09:14

Ankit Bhardwaj

754
8
27

answered Nov 02 '11 at 04:28

lambshaanxy

22,552
10
68
92

4

This, however, will give you all the relationships to a particular entity but won't tell you which fields exist in the infobox – MFARID Mar 23 '15 at 17:22
2

IIUIC they don't provide any databases via API, only some extraction tools. So you need to fetch everything locally. – Onkeltem Sep 24 '18 at 12:27

score 5 · Answer 3 · edited Oct 18 '20 at 19:19

Building on garry's answer, you can have Wikipedia parse the info box into HTML for you via the rvparse parameter like so:

http://en.wikipedia.org/w/api.php?action=query&prop=revisions&rvprop=content&format=json&titles=Scary%20Monsters%20and%20Nice%20Sprites&rvsection=0&rvparse

Note that neither method will return just the info box. But from the HTML content, you can extract (via, e.g., Beautiful Soup) the table with class infobox.

In Python, you do something like the following

resp = requests.get(url).json()
page_one = next(iter(resp['query']['pages'].values()))
revisions = page_one.get('revisions', [])
html = next(iter(revisions[0].values()))
# Now parse the HTML

score 4 · Answer 4 · edited Oct 18 '20 at 19:21

4

If the page has a right side infobox, then use this URL to obtain it in txt form.

My example is using the element hydrogen. All you need to do is replace "Hydrogen" with your title.

https://en.wikipedia.org/w/index.php?action=raw&title=Template:Infobox%20hydrogen

If you are looking for JSON format use this URL, but it's not pretty.

https://en.wikipedia.org/w/api.php?action=parse&page=Template:Infobox%20hydrogen&format=json

edited Oct 18 '20 at 19:21

Peter Mortensen

30,738
21
105
131

answered May 25 '17 at 12:49

Michael DiCioccio

173
7

5

I think this works because this page exists: https://en.wikipedia.org/wiki/Template:Infobox_hydrogen . Replacing hydrogen alone with, let's say, "summer" doesn't work – Mariano Soto Jul 12 '21 at 20:37

How can I get the Infobox from a Wikipedia article by the MediaWiki API?

4 Answers4

Linked