I know the question might be simpler than it seems, but after reading tons of material, I'm really confused.
So, I have downloaded a wikipedia dump (this one to be precise: enwiktionary-20151002-pages-articles-multistream.xml.bz2 - which supposedly contains all articles from the English Wiktionary). What I want is to get the content of a specific article, by title (the same way you would search it in Wikipedia itself).
Note: I don't want the HTML (as generated by wikipedia). I want the "real" content, as you see it when "edit"ing any article in Wikipedia.
In a few words:
- Search for the article with the title, e.g. "book"
- Get the content
How should I go about that?
P.S. I'm not looking for a language-specific solution. I just need some ideas as to how this can be approached.