I have got a bunch of bloated similar machine generated XML files (50% metadata and tags, 50% text) which I would like to transfer into simple MD files (markdown). That way I will be able to use it as sources for a pandoc factory. I started to build a script which replaces the tags with the help of regex statements one by one. But I am sure there must be a smarter way. Can anyone give me a push into the right direction please.
Asked
Active
Viewed 5,218 times
4
-
1If transforming Xml, I'd go for Xslt - RegEx seems way too complicated for this task. – Filburt May 24 '16 at 20:42
-
2Regex is exactly the wrong tool for this. Regular expressions are powerful, but XML is not a regular language and attempting to use regex will just result in frustration and wasted time because unless the XML is extremely simple and regular you'll eventually have to switch to XSLT anyway. XSLT is _designed_ to transform XML to any other format. Here's the [obligatory StackOverflow answer](http://stackoverflow.com/a/1732454/18157) for anyone who asks "can I use regex to parse XML". – Jim Garrison May 24 '16 at 20:50
-
And BTW, the question as it currently stands is off-topic as it is much too broad. If you want some help, [edit] it to include a representative sample of your input and corresponding desired output, and any XSL you've already written. – Jim Garrison May 24 '16 at 20:53
-
use XSLT or similar to go from your XML dialect to DocBook or XHTML, then use pandoc to convert from that to markdown... – mb21 May 24 '16 at 21:51
-
@JimGarrison Nitpicking: The linked answer concerns parsing (X)HTML, not XML. – Filburt May 27 '16 at 08:11