14

Is it possible to query the Wikipedia API for articles that contain a specific template? The documentation does not describe any action that would filter search results to pages that contain a template. Specifically, I am after pages that contain Template:Persondata. After that, I am hoping to be able to retrieve just that specific template in order to populate genealogy data for the openancestry.org project.

The query below shows that the Albert Einstein page contains the Persondata Template, but it doesn't return the contents of the template, and I don't know how to get a list of page titles that contain the template. http://en.wikipedia.org/w/api.php?action=query&prop=templates&titles=Albert%20Einstein&tlcontinue=736|10|ParmPart

Returns:

<api>
 <query>
  <pages>
   <page pageid="736" ns="0" title="Albert Einstein">
    <templates>
     ...
     <tl ns="10" title="Template:Persondata"/>
     ...
    </templates>
   </page>
  </pages>
 </query>
 <query-continue>
  <templates tlcontinue="736|10|Reflist"/>
 </query-continue>
</api>

I suspect that I can't get what I need from the API, but I'm hoping I'm wrong and that someone has already blazed a trail down this path.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
grenade
  • 31,451
  • 23
  • 97
  • 126

3 Answers3

7

You can use the embeddedin query to find all pages that include the template:

curl 'http://en.wikipedia.org/w/api.php?action=query&list=embeddedin&eititle=Template:Persondata&eilimit=5&format=xml'

Which gets you:

<?xml version="1.0"?>
<api>
  <query>
    <embeddedin>
      <ei pageid="307" ns="0" title="Abraham Lincoln" />
      <ei pageid="308" ns="0" title="Aristotle" />
      <ei pageid="339" ns="0" title="Ayn Rand" />
      <ei pageid="340" ns="0" title="Alain Connes" />
      <ei pageid="344" ns="0" title="Allan Dwan" />
    </embeddedin>
  </query>
  <query-continue>
    <embeddedin eicontinue="10|Persondata|595" />
  </query-continue>
</api>

See full docs at mediawiki.org.

Edit Use embeddedin query instead of backlinks (which doesn't cover template inclusions)

lambshaanxy
  • 22,552
  • 10
  • 68
  • 92
  • +1 That's cool and nearly does what I need but for some reason the results tend to be from all but the main namespace rendering it useless for my needs. Even if I append blnamespace=0 as per the docs the search will not return data from the articles namespace which is where all of the useful persondata biographies will be. Persondata in the talk namespace is pretty much useless. – grenade Nov 08 '10 at 09:20
  • Oops, apparently that doesn't cover template inclusions. But the `embeddedin` query does, so try this instead: `http://en.wikipedia.org/w/api.php?action=query&list=embeddedin&eititle=Template:Persondata&format=xml` – lambshaanxy Nov 08 '10 at 23:04
3

Using embeddedin does not allow you to search for a specific person, the search string becomes the Template:Persondata.

The best way I've found to get only people from Wikipedia is to use list=search and filter the search using AND"Born"AND"Occupation":

http://en.wikipedia.org/w/api.php?action=query&list=search&srsearch="Tom Cruise"AND"Born"AND"Occupation"&format=jsonfm&srprop=snippet&srlimit=50`

Remember that Wikipedia is using a search engine that doesn't yet allow us to search only the title, it will search the full text. You can take advantage of that to get more precise results.

rybo111
  • 12,240
  • 4
  • 61
  • 70
user2419708
  • 133
  • 1
  • 1
  • 5
1

The accepted answer explains how to list pages using a certain template, but if you need to search for pages using the template, you can with the hastemplate: search keyword: https://en.wikipedia.org/w/api.php?action=query&list=search&srsearch=hastemplate:NPOV%20physics

Tgr
  • 27,442
  • 12
  • 81
  • 118