0

Basicly I download a few XML files and then append them with Element Tree. The problem is that the final file has these things:

<<?xml version="1.0" encoding="UTF-8" standalone="yes"?> - at the start of each new xml fil
...
</product_info> /><product_info> ...

where product info is the actual cosing tag and the /> is what is messing everything up.

I fixed the first part by removing the XML declaration in the original xml file with:

replace('<?xml version="1.0" encoding="UTF-8" standalone="yes"?><','')
#I remove a closing bracet at the end because I cannot remove the opening bracet as it is not in the original file

I suspect the problem is that for some reason before each XML files is apeneded it is enclosed in some tag?

When I check the 'ET.SubElement(root,response_xml)' this is what prints:

<Element 'product_info article_id="0006303562403"...'

Could the tag be the problem?

Denis
  • 29
  • 7
  • 1
    If your original files had that content, they weren't really XML in the first place. If the input files are real XML but the output files aren't, we'd need to see a [mre] with the shortest possible code (and sample input document) that makes that happen to speak to how to fix it. – Charles Duffy Jul 26 '20 at 17:09

1 Answers1

1

Your file won't qualify as XML if it's not well-formed, and you generally cannot use libraries designed to parse XML on data that fails to meet the definition of XML.

Examples of failures to be well-formed include:

  • Having any content before the XML declaration.
  • Having multiple root elements.
  • Not properly closing an element.
  • Using characters not allowed in component names. (XML attribute names may not start with a ', for example.)

You must fix the code that violates the rules of well-formedness, or edit the data manually to repair, or see this Q/A for other options:

kjhughes
  • 106,133
  • 27
  • 181
  • 240