First of all, I'm an R Programmer. My team needs to translate my RScript to Python, in order to extract some data from XML and convert that to JSON.
According to the documentation, and particulary this answer:
I've done the following:
OPTION 1
import xml.etree.ElementTree
e = xml.etree.ElementTree.parse('boleta1A.xml').getroot()
for atype in e.findall('cbc:ID'):
print(atype.text)
Getting any results.
OPTION 2
import xml.etree.ElementTree as ET
tree = ET.parse('boleta1A.xml')
root = tree.getroot()
root.findall("./sac:AdditionalMonetaryTotal/cbc:ID").text
Here, I'm getting:
Traceback (most recent call last):
File "C:\ProgramData\Anaconda3\lib\site-packages\IPython\core\interactiveshell.py", line 2862, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-29-2eefd6f96456>", line 5, in <module>
root.findall("./sac:AdditionalMonetaryTotal/cbc:ID").text
File "C:\ProgramData\Anaconda3\lib\xml\etree\ElementPath.py", line 304, in findall
return list(iterfind(elem, path, namespaces))
File "C:\ProgramData\Anaconda3\lib\xml\etree\ElementPath.py", line 283, in iterfind
token = next()
File "C:\ProgramData\Anaconda3\lib\xml\etree\ElementPath.py", line 83, in xpath_tokenizer
raise SyntaxError("prefix %r not found in prefix map" % prefix)
File "<string>", line unknown
SyntaxError: prefix 'sac' not found in prefix map
Here I think I need to add the namespace, but I'm not understanding well why and what for from the documentation:
https://docs.python.org/2.7/library/xml.etree.elementtree.html#parsing-xml-with-namespaces
XML FILE:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<Invoice xmlns="urn:oasis:names:specification:ubl:schema:xsd:Invoice-2" xmlns:cac="urn:oasis:names:specification:ubl:schema:xsd:CommonAggregateComponents-2" xmlns:cbc="urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2" xmlns:ccts="urn:un:unece:uncefact:documentation:2" xmlns:ds="http://www.w3.org/2000/09/xmldsig#" xmlns:ext="urn:oasis:names:specification:ubl:schema:xsd:CommonExtensionComponents-2" xmlns:qdt="urn:oasis:names:specification:ubl:schema:xsd:QualifiedDatatypes-2" xmlns:sac="urn:sunat:names:specification:ubl:peru:schema:xsd:SunatAggregateComponents-1" xmlns:schemaLocation="urn:oasis:names:specification:ubl:schema:xsd:Invoice-2 ..\xsd\maindoc\UBLPE-Invoice-2.0.xsd" xmlns:udt="urn:un:unece:uncefact:data:specification:UnqualifiedDataTypesSchemaModule:2" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<ext:UBLExtensions>
<ext:UBLExtension>
<ext:ExtensionContent>
<sac:AdditionalInformation>
<sac:AdditionalMonetaryTotal>
<cbc:ID>1001</cbc:ID>
<cbc:PayableAmount currencyID="PEN">388.3</cbc:PayableAmount>
</sac:AdditionalMonetaryTotal>
<sac:AdditionalProperty>
<cbc:ID>1000</cbc:ID>
<cbc:Value><![CDATA[CUATROCIENTOS SESENTA Y UN CON 56 /100 NUEVOS SOLES]]></cbc:Value>
</sac:AdditionalProperty>
</sac:AdditionalInformation>
</ext:ExtensionContent>
</ext:UBLExtension>
</ext:UBLExtensions>
</Invoice>
PLUS: I'm using a Jupyter Notebook, would you recommend this? Or are there, in the python world, something more similar to RStudio?
Thank you!