8

I need to retrieve a list of all existing languages for a certain wiki project. For example, all Wikivoyage or all Wikipedia languages, just like on their landing pages.

I prefer to do this via MediaWiki API, if it's possible.

Thanks for your time.

Damjan Pavlica
  • 31,277
  • 10
  • 71
  • 76

3 Answers3

8

Approach 3: Using an API in the Wikimedia wiki farm and Extension:Sitematrix

https://commons.wikimedia.org/w/api.php?action=sitematrix&smtype=language

While this will return all wikis, the matrix knows about, it is easily filtered client side by code [as of now, one of: wiki (Wikipedia), wiktionary, wikibooks, wikinews, wikiquote, wikisource, wikiversity, wikivoyage] and by its closed state. One request with just some response body overhead but since it's easily cached and compresses well, not that serve.

Rainer Rillke
  • 1,281
  • 12
  • 24
  • 1
    This seems like a solution to my problem. I'll need some time to test and give a feedback. – Damjan Pavlica Nov 14 '15 at 18:38
  • 1
    If you are using a loosy-typed language, make sure to test the `closed` property against undefined or using `.hasOwnProperty()`, as an empty string will possibly evaluate to false. – Rainer Rillke Nov 14 '15 at 18:40
7

Approach 1: Using an API in the Wikimedia wiki farm

To get all interwiki prefixes that a wiki knows of, use the meta module of the MediaWiki API, and query any project for siprop=interwikimap:

https://en.wikipedia.org/w/api.php?action=query&meta=siteinfo&siprop=interwikimap

You will get a large array of objects like this:

{
    "prefix": "aa",
    "local": "",
    "language": "Qaf\u00e1r af",
    "url": "https://aa.wikipedia.org/wiki/$1",
    "protorel": ""
}

protorel tells you if the url is protocol relative or not (i.e. starting with //. For the WikiMedia wikis, they will start with https. The $1 in the URL is, as you would have imagined, a placeholder for the title.

To get only the wikis in the same wikifarm (e.g. Wikimedia wikis), add sifilteriw=local to your query:

https://sv.wikipedia.org/w/api.php?action=query&meta=siteinfo&siprop=interwikimap&sifilteriw=local

To fetch the names in you langue use siinlanguagecode, like this (all Wikimedia wikis, with their Swedish names, retrieved from arabic Wikipedia, but could have been any endpoint in the wiki farm):

https://ar.wikipedia.org/w/api.php?action=query&meta=siteinfo&siprop=interwikimap&sifilteriw=local&siinlanguagecode=sv

From here you would have to filter out e.g. the Wikipedias yourself.

Approach 2: Using Wikistats at wmflabs

A list already filtered by type of project is available at http://wikistats.wmflabs.org (csv), where you can filter out Wikipedia, Wikiversity, etc. The csv file is updated on daily basis, but the tool is experimental, and might not be there forever.

In either approach, Wikimedia Incubator wikis will not show up.

leo
  • 8,106
  • 7
  • 48
  • 80
  • Is there a way to return the current status of the languages? I get this list https://en.wiktionary.org/w/api.php?action=query&meta=siteinfo&siprop=interwikimap&sifilteriw=local&format=json&formatversion=2&callback=JSON_CALLBACK but many projects has been closed. – Damjan Pavlica Nov 10 '15 at 16:14
  • 2
    You can check for each project if it's closed, using meta=siteinfo: https://ang.wikiquote.org/w/api.php?action=query&meta=siteinfo&siprop=general%7Cnamespaces%7Cnamespacealiases%7Cstatistics But I don't think you can filter the list from the start (I might be wrong) – leo Nov 10 '15 at 16:51
  • Thanks for your answers. I will leave this question open, maybe someone wil come with some solution. – Damjan Pavlica Nov 10 '15 at 20:28
3

Subtract closed.dblist from wikipedia.dblist (other lists), then remove wiki from the end and replace _ with -.

Tgr
  • 27,442
  • 12
  • 81
  • 118
  • 1
    Unfortunately this hasn't worked properly in all cases since https://phabricator.wikimedia.org/T11823 (September 2015) - be_x_oldwiki has the canonical domain be-tarask.wikipedia.org - I did set up a redirect from the old domain though. I expect we'll have more of these cases in future. – Krenair Nov 06 '16 at 04:13