I am processing XML feed with BeautifulSoup, but from some reason it is skipping part of param tag. I allready tried to change the parser (html.parser / html5lib / lxml), but all have same output.
Can somene help with this?
Original XML file:
<SHOPITEM>
<PRODUCTNO>DK28-SLV</PRODUCTNO>
<PARAM>
<PARAM_NAME>Způsob komunikace</PARAM_NAME>
<VAL>WiFi pro internetové připojení</VAL>
</PARAM>
</SHOPITEM>
Output from BeautifulSoup:
<shopitem>
<productno>DK28-SLV</productno>
<param_name>Způsob komunikace</param_name>
<val>WiFi pro internetové připojení</val>
<param/>
</shopitem>
Desired output:
<shopitem>
<productno>DK28-SLV</productno>
<param> -------> This one is missing
<param_name>Způsob komunikace</param_name>
<val>WiFi pro internetové připojení</val>
<param/>
</shopitem>
My code:
from bs4 import BeautifulSoup
import requests
source = requests.get("my-xml-feed-url").text
soup = BeautifulSoup(source, "lxml")
product = soup.find("shopitem")
for product in soup.find_all("shopitem"):
productno = product.find("productno")
print(productno)
param = product.find("param")
print(param)
param_name = product.find("param_name")
print(param_name)
param_val = product.find("val")
print(param_val)
UPDATE: after testing to change parser to "xml".
It partly helped, and tag is now shown correctly. But XML file is now corrupted on different place. It seems that from approx. 1/2 of XML it is OK, but first 1/2 of XML is not shown..
Original XML:
<PARAM>
<PARAM_NAME>Funkce alarmu</PARAM_NAME>
<VAL>Ano, do mobilní aplikace</VAL>
</PARAM>
Output begining:
/PARAM_NAME>
<VAL>Ano, do mobilní aplikace</VAL>
</PARAM>
This is where output starts.. so from some reason the part of XML before this part is cut off. It seems that there is nothing different in XML structure before and after this point. so I see no reason for this.
Further output is OK:
<PARAM>
<PARAM_NAME>
Úhel záběru
</PARAM_NAME>
<VAL>
60°
</VAL>
</PARAM>