0

I try to read one of Italy's electronic invoice file.

The header is not quiet common (to me, at least):

<n0:FatturaElettronica versione="FPR12" xmlns:n0="http://ivaservizi.agenziaentrate.gov.it/docs/xsd/fatture/v1.2" xmlns:prx="urn:sap.com:proxy:RTP:/1SAI/TAS2413580D04BDBC7A8515:750" xmlns:n1="http://www.w3.org/2000/09/xmldsig#">
 <FatturaElettronicaHeader> 

I am trying to retrieve the value of xmlns:prx. The root.tag sends back : {http://ivaservizi.agenziaentrate.gov.it/docs/xsd/fatture/v1.2}FatturaElettronica ... and root.attrib: {'versione': 'FPR12'}

This "xmlns:prx" looks like an attribute, but remain invisible to etree. How can I access those values?

JCF
  • 307
  • 2
  • 17
  • Those are [namespace declarations](https://www.w3.org/TR/REC-xml-names/#ns-decl). See this [similar question here](https://stackoverflow.com/questions/42987353/list-namespace-definitions-in-an-xml-document-with-elementtree) for getting prefix/uri values. – Daniel Haley Mar 25 '22 at 21:17

1 Answers1

0

Thanks to the links provided by Daniel Haley, I could adjust my code as follows:

my_namespaces = [node for _, node in ET.iterparse(file, events=['start-ns'])]
print(my_namespaces)

It gives the list of namespaces: [('n0', 'http://ivaservizi.agenziaentrate.gov.it/docs/xsd/fatture/v1.2'), ('prx', 'urn:sap.com:proxy:RTP:/1SAI/TAS2413580D04BDBC7A8515:750'), ('n1', 'http://www.w3.org/2000/09/xmldsig#')]

The individual values can easily be accessed, e.g.:

print(f"Source System: {my_namespaces[1][1][18:21]}") 

will print:
Source System: RTP

JCF
  • 307
  • 2
  • 17