It is easy to download dumps of Wikipedia in XML format. However, the content of the articles are written in wikitext, which has a template system. To extract clean full texts from these dumps, it is necessary to expand these templates. Wikipedia provides an API to do so but it is not suitable for expanding an entire dump. Several scripts can be found to deal with wikitext, such as this one written in python, but they all seems outdated or simply don't deal with templates. Another way of tackling this problem would be to run Wikimedia on a computer and use the API:Expandtemplates but it seems to be a quite cumbersome solution. Finally, HTML dumps also exist, but I prefer to work with expanded wikitexts since it makes it easier to deal with wikilinks, tables, sections etc.
My goal here is to extract clean texts while keeping the wikilinks and discarding complicated templates such as info-boxes. Do you have any idea how to tackle this template expansion problem ?