I am working with xml log files which would like to "pretty print". And there would be absolutely no problem, however log file contains error lines, where xml format is broken. It could be unopened xml block or block without ending. Like:
<root>
<time>2021-07-28 10:27:49,869</time><modification_request id="11d18Dqwerty" ytrew:qw2="url:qwertyu:qwerty:qwer:qw:qwer:0:0"><objectclass>objectID</objectclass><identifier qwerty>123321</identifier><modification operation="delete"><valueObject ytrew:qw2="url:qwertyu:qwerty:qwer:qw:qwer:0:0" ytrew:qw2="url:qwertyu:qwerty:qwer:qw:qwer:0:0:type="8"><objectA>123321</objectA></valueObject></modification></qw2:modifyRequest>
<time>2021-07-28 10:27:49,881</time><modification_response id="11d18Dqwerty" ytrew:qw2="url:qwertyu:qwerty:qwer:qw:qwer:0:0"><objectclass>Error</objectclass><identifier qwerty>123321<modification operation="delete"><objectA>123321</objectA></valueObject></modification></qw2:modifyRequest>
</root>
--->---> ^
--->---> Like here (Not closed XML block)
Currently I am using code:
import xml.dom.minidom
try:
with open(r'output\output_xml.txt', 'r+') as file:
s = file.readlines()
pretty = xml.dom.minidom.parseString(s).toprettyxml()
except:
print('Pretty function calling unsuccessful')
try:
with open(r'output\output_xml.txt', 'w') as writer:
writer.write(pretty)
except:
print('Writting to file unsuccessful')
If the only way to parse file successfully is to fix non-closed block, then I guess it is good idea to skip broken lines. Thanks everyone :)