0

I am trying to get all Wikipedia articles for a category and its sub categories.

I have currently figured out a minor part of the problem that is to use wiki API. For example, to look for the Category:Geography, I have used the API to find the Category of Geography:

https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=Category:Geography&cmlimit=100

I have gotten the JSON response:

{  
   "batchcomplete":"",
   "query":{  
      "categorymembers":[  
         {  
            "pageid":5883021,
            "ns":14,
            "title":"Category:Branches of geography"
         },
         {  
            "pageid":5782300,
            "ns":14,
            "title":"Category:Geography by place"
         },
         {  
            "pageid":8700702,
            "ns":14,
            "title":"Category:Geography awards and competitions"
         },
         ...
      ]
   }
}

Now my problem is how do I make use of this to make a Python script to run and collect all the articles? I have encountered another problem because for example if I enter to the first cateogry: Branches of geography it contains more categories and subcategories. How do I make a script that it will transverse all the way down till it reach the article, save it to text file and then move back up the category and collect more?

Termininja
  • 6,620
  • 12
  • 48
  • 49
windboy
  • 141
  • 1
  • 9
  • 1
    Note that there is nothing that prevents a category from containing its parent, so make sure you have some mechanism to avoid infinite loops. Aside from that, see any of the other questions asking the same thing – leo May 26 '16 at 07:49
  • I did that, I read all related categories, now I am stuck I do not know how to contrinue. – windboy May 26 '16 at 08:46
  • What did you do so far, and where did you get stuck? It will be helpful if you can show the code you already got. (Most people here are more than happy to help out, but few want to write other people's code for them...) – leo May 26 '16 at 09:20

0 Answers0