I'd like to download all the Wikipedia pages in HTML.
Wikipedia API does an excellent job in fetching an wiki article in HTML, provided search title is mentioned.
I use the below snippet to extract a wiki article given the title:
title = 'Barack Obama'
ny = wikipedia.page(title)
data = urllib.urlopen(ny.url)
htmlSource = data.read()
The above snippet gives me the wiki page (HTML) on Barack Obama
I need the HTML file specifically because I have written some regex's to extract relevant information from the page.
I'd be glad if anybody could help me accomplish this task.