I have a xml file, the content like
content ="""<?xml version="1.0" ?>
<passage>
<title>Aggrecan Turnover</title>
<author>Winsz-Szczotka K,Kuźnik-Trocha K,Komosińska-Vassev K,Jura-Półtorak A,Olczyk K</author>
<source>Disease markers</source>
<description>
xxxxxxx
</description>
<filename>26924871.xml</filename>
<passage_url>http://www.ncbi.nlm.nih.gov/pubmed/26924871</passage_url>
<received_date>2016-03-02</received_date>
<parameter_date>2016-02-29</parameter_date>
</passage>"""
from bs4 import BeautifulSoup
soup = BeautifulSoup(content, "xml")
soup.find("author")
Result:
On Windows:
<author>Winsz-Szczotka K,Kuźnik-Trocha K,Komosińska-Vassev K,Jura-Półtorak A,Olczyk K</author>
On Linux
Nothing find,
When i change <author>
node to <author>Winsz-Szczotka</author>
, then it can find the node both Windows and Linux, So what make this happened?
Besides, when i change the parser to html.parser
on Linux, it works well, i am confused, the content is xml format, why use html.parse work well?
anybody can tell me something, thanks.