64

I'm trying to find out if there's a Wikipedia API (I think it is related to the MediaWIki?).

If so, I would like to know how I would tell Wikipedia to give me an article about the new york yankees for example.

What would the REST URL be for this example?

All the docs on this subject seem fairly complicated.

logi-kal
  • 7,107
  • 6
  • 31
  • 43
chris
  • 20,791
  • 29
  • 77
  • 90
  • 5
    The "if it exists" part is also covered here: http://stackoverflow.com/questions/627594/is-there-a-wikipedia-api. But I think the "how to use it" part is a legitimate question... sort of. – Jonik Jun 08 '09 at 12:14
  • There is now an R package that accesses the Mediawiki API (and so Wikipedia), more details and an example: http://stackoverflow.com/a/24027866/1036500 – Ben Jun 04 '14 at 02:06

8 Answers8

82

You really really need to spend some time reading the documentation, as this took me a moment to look and click on the link to fix it. :/ but out of sympathy i'll provide you a link that maybe you can learn to use.

http://en.wikipedia.org/w/api.php?action=query&prop=revisions&titles=New_York_Yankees&rvprop=timestamp|user|comment|content

That's the variabled you will be looking to get. Your best bet is to know the page you will be after and replace the Wikipedia link part into the title i.e.:

http://en.wikipedia.org/wiki/New_York_Yankees [Take the part after the wiki/]

-->

http://en.wikipedia.org/w/api.php?action=query&prop=revisions&titles=New_York_Yankees&rvprop=timestamp|user|comment|content

[Place it in the title variable of the GET request.

The URL above can do with tweaking to get the different sections you do or do not want. So read the documentation :)

Shadi Almosri
  • 11,678
  • 16
  • 58
  • 80
  • 19
    +1 for an actual example, instead of just dumping links (even though the example is also just a link... :) – Jonik Jun 08 '09 at 12:11
  • 6
    A FANTASTIC PLACE to start is with the wikipedia sandbox. It can help you format your requests/queries: http://en.wikipedia.org/wiki/Special:ApiSandbox – LucianNovo Feb 10 '13 at 08:23
  • 3
    What if I don't know the specific page? Like if I want to search for the band Iron Maiden? The page could be "iron maiden", "Iron Maiden", "Iron Maiden band". How do I search for that? – Rodrigo Ruiz May 20 '14 at 16:23
67

The answers here helped me arrive at a solution, but I discovered more info in the process which may be of advantage to others who find this question. I figure most people simply want to use the API to quickly get content off the page. Here is how I'm doing that:

Using Revisions:

//working url:
http://en.wikipedia.org/w/api.php?action=query&prop=revisions&titles=Threadless&rvprop=content&format=json&rvsection=0&rvparse=1

//Explanation
//Base Url:
http://en.wikipedia.org/w/api.php?action=query

//tell it to get revisions:
&prop=revisions

//define page titles separated by pipes. In the example i used t-shirt company threadless
&titles=whatever|the|title|is

//specify that we want the page content
&rvprop=content

//I want my data in JSON, default is XML
&format=json

//lets you choose which section you want. 0 is the first one.
&rvsection=0

//tell wikipedia to parse it into html for you
&rvparse=1

Using Extracts (better/easier for what i'm doing)

//working url:
http://en.wikipedia.org/w/api.php?action=query&prop=extracts&titles=Threadless&format=json&exintro=1

//only explaining new parameters
//instead of revisions, we'll set prop=extracts
&prop=extracts

//if we just want the intro, we can use exintro. Otherwise it shows all sections
&exintro=1

All the info requires reading through the API documentation as was mentioned, but I hope these examples will help the majority of the people who come here for a quick fix.

Andy Groff
  • 2,660
  • 1
  • 21
  • 25
  • The first working url provided also allows you to retrieve the infobox for the wiki page! Thanks – Gaʀʀʏ Dec 12 '12 at 21:12
  • hi, is there a way to get the plain text from the main description?? its very difficult to parse wikitext or HTMl responses :(. any help will be highly appreciated please. – Punith Raj Jun 06 '13 at 19:39
13

See http://www.mediawiki.org/wiki/API

Specifically, for the English Wikipedia, API is located at http://en.wikipedia.org/w/api.php

Nemo
  • 2,441
  • 2
  • 29
  • 63
drdaeman
  • 11,159
  • 7
  • 59
  • 104
  • 3
    yea, i cant figure out how to do my example after reading that. any ideas? – chris Jun 08 '09 at 11:34
  • 3
    no, i seriously can't figure that document out. i don't know how to get specific page data using that api. – chris Jun 08 '09 at 11:37
  • 3
    You actually can't. To get raw article source you should access the articles this way: http://www.mediawiki.org/w/index.php?title=API&action=raw – drdaeman Jun 08 '09 at 12:26
10

Have a look at the ApiSandbox at https://en.wikipedia.org/wiki/Special:ApiSandbox That is a web frontend to easily query the API. A few clicks will craft you the URL and show you the API result.

That is an extension for MediaWiki, enabled on all Wikipedia languages. https://www.mediawiki.org/wiki/Extension:ApiSandbox

Nemo
  • 2,441
  • 2
  • 29
  • 63
8

If you want to extract structured data from Wikipedia, you may consider using DbPedia http://dbpedia.org/

It provides means to query data using given criteria using SPARQL and returns data from parsed Wikipedia infobox templates

There are some SPARQL libraries available for multiple platforms to make queries easier

Niklas
  • 13,005
  • 23
  • 79
  • 119
Maksym Kozlenko
  • 10,273
  • 2
  • 66
  • 55
3

If you want to extract structured data from Wikipedia, you may also try http://www.wikidata.org/wiki/Wikidata:Main_Page

brian.clear
  • 5,277
  • 2
  • 41
  • 62
1

Below is a working example that prints the first sentence from Wikipedias New York Yankees page to your web browsers console:

<!DOCTYPE html>
</html>
    <head>
        <script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.3/jquery.min.js"></script>
    </head>
    <body>
        <script>
            var wikiUrl = "http://en.wikipedia.org/w/api.php?action=opensearch&search=New_York_Yankees&format=json&callback=wikiCallbackFunction";

            $.ajax(wikiUrl, {
                dataType: "jsonp",
                success: function( wikiResponse ) {
                    console.log( wikiResponse[2][0] );
                }
            });
        </script>   
    </body>
</html>

http://en.wikipedia.org/w/api.php is the endpoint for your url. You can see how to structure your url by visiting: http://www.mediawiki.org/wiki/API:Main_page

I used jsonp as the dataType to allow cross-site requests. More can be found here: http://www.mediawiki.org/wiki/API:Cross-site_requests

Last but not least, make sure to reference the Jquery.ajax() API: http://api.jquery.com/jquery.ajax/

JSON C11
  • 11,272
  • 7
  • 78
  • 65
0

Wiki Parser converts Wikipedia dumps into XML. It is also quite fast. You can then use any XML processing tool to handle the data from the parsed Wikipedia articles.

PlinyTheElder
  • 1,454
  • 1
  • 10
  • 15