I'm trying to get valid pretty printed xml in order to pass it further to requests
However, xml "prettifyed" by BeautifulSoup looks like this:
...
<typ>
TYPE_1
</typ>
<rte>
AL38941XXXXX
</rte>
<sts>
ADDED
</sts>
...
Handy way of dealing with such a messy output described here
text_re = re.compile('>\n\s+([^<>\s].*?)\n\s+</', re.DOTALL)
prettyXml = text_re.sub('>\g<1></', uglyXml)
which gives:
<typ>TYPE_1</typ>
<rte>AL38941XXXXX</rte>
<sts>ADDED</sts>
However, when it comes to empty values regex just skipping them, which leads problems when some of values in parsed string were empty.
Example:
<typ>TYPE_1</typ>
<rte>AL38941XXXXX</rte>
<sts>ADDED</sts>
<ref>
</ref>
Then requests
tries to run query with parameter of ' '
in empty tag, what leads to incorrect query result.
I'm not really fluent in regex so tried >\n\s+</
in another regex, failed and hacked it like this:
text_re = re.compile('>\n\s+([^<>\s].*?)\n\s+</', re.DOTALL)
prettyXml = text_re.sub('>\g<1></', uglyXml).replace('>\n ', '><').replace('>\n ', '><')
And all the "pretty" markup sadly gone... It kinda works, but how this should be done properly?