29

I want to get all the articles names under a category and its sub-categories.

Options I'm aware of:

  1. Using the Wikipedia API. Does it have such an option??
  2. d/l the dump. Which format would be better for my usage?
  3. There is also an option to search in Wikipedia something like incategory:"music", but I didn't see an option to view that in XML.

Please share your thoughts

Termininja
  • 6,620
  • 12
  • 48
  • 49
Noam
  • 3,341
  • 4
  • 35
  • 64

3 Answers3

16

The following resource will help you to download all pages from the category and all its subcategories:

http://en.wikipedia.org/wiki/Wikipedia:CatScan

There is also an API available here:

https://www.mediawiki.org/wiki/API:Categorymembers

Datageek
  • 25,977
  • 6
  • 66
  • 70
11

You can do this through the following two API methods:

For articles pages for this category

YOUR_URL/api.php?action=query&format=json&list=categorymembers&cmtitle=Category:Music

For get subcategories:

YOUR_URL/api.php?action=query&format=json&list=categorymembers&cmtype=subcat&cmtitle=Category:Music

You can get more info on Mediawiki API

Martin Thoma
  • 124,992
  • 159
  • 614
  • 958
Adexe Rivera
  • 414
  • 7
  • 12
8

Note that Wikipedia's categorization system is not a tree, or even an acyclic graph. It is quite possible that by continually following subcategory links you will eventually wind up back where you started.

If you are going to be making many such queries, you would be best served by downloading a database dump. If this will be an infrequent thing and will only be dealing with small categories, you could probably get away with making repeated queries to list=categorymembers.

incategory:"music" does not appear to do subcategory searching.

Anomie
  • 92,546
  • 13
  • 126
  • 145
  • Would you recommend downloading the XML or SQL for my purpose? – Noam Apr 26 '11 at 08:09
  • 2
    @Noam: Whichever is more convenient for you, really. Note that you may only need the categorylinks.sql dump, or that and the page.sql dump, depending on just what you are trying to do. – Anomie Apr 26 '11 at 15:46
  • @Anomie Do you have any reference (or example) to the claim that wikipedia categories are not acyclic? – Peter Franek Apr 25 '18 at 20:35
  • 1
    While they may have been fixed by now, https://en.wikipedia.org/wiki/Wikipedia:Dump_reports/Category_cycles lists 100 examples that existed around June 2016. – Anomie Apr 26 '18 at 10:22