0

I am working with xml log files which would like to "pretty print". And there would be absolutely no problem, however log file contains error lines, where xml format is broken. It could be unopened xml block or block without ending. Like:

<root>
<time>2021-07-28 10:27:49,869</time><modification_request id="11d18Dqwerty" ytrew:qw2="url:qwertyu:qwerty:qwer:qw:qwer:0:0"><objectclass>objectID</objectclass><identifier qwerty>123321</identifier><modification operation="delete"><valueObject ytrew:qw2="url:qwertyu:qwerty:qwer:qw:qwer:0:0" ytrew:qw2="url:qwertyu:qwerty:qwer:qw:qwer:0:0:type="8"><objectA>123321</objectA></valueObject></modification></qw2:modifyRequest>
<time>2021-07-28 10:27:49,881</time><modification_response id="11d18Dqwerty" ytrew:qw2="url:qwertyu:qwerty:qwer:qw:qwer:0:0"><objectclass>Error</objectclass><identifier qwerty>123321<modification operation="delete"><objectA>123321</objectA></valueObject></modification></qw2:modifyRequest>
</root>
--->--->                                                                                                                                                                 ^
--->--->                                                                                                                                                                 Like here (Not closed XML block)                                                                                         

Currently I am using code:

            import xml.dom.minidom
            try:
                with open(r'output\output_xml.txt', 'r+') as file:
                    s = file.readlines()
                    pretty = xml.dom.minidom.parseString(s).toprettyxml()
            except:
                print('Pretty function calling unsuccessful')
            try:
                with open(r'output\output_xml.txt', 'w') as writer:
                    writer.write(pretty)
            except:
                print('Writting to file unsuccessful')

If the only way to parse file successfully is to fix non-closed block, then I guess it is good idea to skip broken lines. Thanks everyone :)

  • The rules of XML are strict. If the file is broken (ill-formed), it is not XML. You might find something useful here: https://stackoverflow.com/q/44765194/407651 – mzjn Aug 09 '21 at 13:38
  • It's very broken. The only resemblance to XML is that there are some angle brackets around. Probably best to treat it as a random ad-hoc log file format and forget about any resemblance to XML. – Michael Kay Aug 09 '21 at 13:49
  • Thank you, @mzjn left article really helped, I have used BeautifulSoup in my code and now the problem is solved. Thanks a lot! – ginger_beard Aug 10 '21 at 11:26

0 Answers0