I am trying to scrape wiki travel for specific data. like climate, getin etc. I have managed to get the xml from them with special export.
http://wikitravel.org/en/Special:Export/San_Francisco I got the data in xml form but it is in wiki markup and I tried browsing for a solution to get that text, but was unable to find a suitable solution.
I tried writing a php function with regular expressions so i can convert it into html, but it gets converted in a non uniform manner so very difficult to select specific data.
Also tried writing mediawiki url so I can program something http://wikitravel.org/en/api.php?format=xml&action=query&titles=Main%20Page&prop=revisions&rvprop=content But it does not work.
Can you please help me with this. Has anyone successfully scraped wikipedia. I there a tutorial or any other technique I can refer.