Situation like the following.
XML file:
<tag1/>
<tag2>some_data</tag2>
<tag1>some_another_data</tag1>
tag1 is sometimes self-closing and sometimes has data inside.
code:
from BeautifulSoup import BeautifulStoneSoup
s = '<tag1/><tag2>some_data</tag2><tag1>some_another_data</tag1>'
soup1 = BeautifulStoneSoup(s)
soup2 = BeautifulStoneSoup(s, selfClosingTags=["tag1"])
print soup1.prettify()
print
print soup2.prettify()
output:
<tag1>
<tag2>
some_data
</tag2>
</tag1>
<tag1>
some_another_data
</tag1>
<tag1 />
<tag2>
some_data
</tag2>
<tag1 />
some_another_data
In the first case tag1 eats the following tag (if it is not tag1 again), because there is no support of self-closing tags by default. in the second case self-closing tag doesn't support child tags.
I just want to get structure as original xml document. Is it possible with BeautifulSoup? And if it is possible, then how to make all tags self-closing by default? There is a lot of xml files and I don't want to search all such situations manually.