1

I have a dummy xml file,

<?xml version="1.0" encoding="UTF-8"?>
<hello xmlns="abc">
<inside>
  <ok>xyz</ok>
</inside>
</hello>
<?xml version="1.0" encoding="UTF-8"?>
  <xyz xmlns="acxd">
  </xyz>
<?xml version="1.0" encoding="UTF-8"?>
<zz xmlns="zmrt">
</zz>
]]>]]>

And Iam trying to parse this xml file, using following code.

import xml.etree.ElementTree as ET
mytree = ET.parse(temp_xml)

The error I am getting is "ParseError: junk after document element: line 7, column 0". I did try to remove ']]>]]>' i.e. in line 7 but still I am getting same error i.e. "ParseError: junk after document element: line 8, column 0". Is there a way to deal with such error or we can skip reading such lines where there is junk data ?

enter image description here

Shaji Thorn Blue
  • 589
  • 2
  • 9
  • 19
  • I'm not completely familiar with XML, but can you have multiple documents in one file? – clubby789 Sep 15 '20 at 14:25
  • @JammyDodger: You're familiar enough to suspect correctly the issue. Only a single root element may exist in a well-formed XML document. See [my answer below](https://stackoverflow.com/a/63904076/290085) for further details. – kjhughes Sep 15 '20 at 14:38

1 Answers1

2

XML document may only have a single root element. Yours has three and therefore is not well-formed. If you wish to parse it using XML tools, you'll have to first, manually or programmatically, separate the root elements into their own documents.

Note that an XML document also can have at most a single XML declaration (<?xml version="1.0" encoding="UTF-8"?>), and if it exists, it must be at the top of the file.

See also

kjhughes
  • 106,133
  • 27
  • 181
  • 240