I have an xml as below. There are few tags which are prefixed with ce
for example <ce:title>
. When I run the code as below with xpath, in output, <ce:title>
is replaced with <title>
. I did see other links on SO like How to preserve namespace information when parsing HTML with lxml? but not sure where and how to add namespace details.
Can someone please suggest ? How can I retain <ce:title>
for below xml?
from lxml import html
from lxml.etree import tostring
with open('102277033304.xml', encoding='utf-8') as file_object:
xml = file_object.read().strip()
root = html.fromstring(xml)
for element in root.xpath('//item/book/pages/*'):
html = tostring(element, encoding='utf-8')
print(html)
XML:
<item>
<book>
<pages>
<page-info>
<page>
<ce:title>Chapter 1</ce:title>
<content>Welcome to Chapter 1</content>
</page>
<page>
<ce:title>Chapter 2</ce:title>
<content>Welcome to Chapter 2</content>
</page>
</page-info>
<page-fulltext>Published in page 1</page-fulltext>
<page-info>
<page>
<ce:title>Chapter 1</ce:title>
<content>Welcome to Chapter 1</content>
</page>
<page>
<ce:title>Chapter 2</ce:title>
<content>Welcome to Chapter 2</content>
</page>
</page-info>
<page-fulltext>Published in page 2</page-fulltext>
<page-info>
<page>
<ce:title>Chapter 1</ce:title>
<content>Welcome to Chapter 1</content>
</page>
<page>
<ce:title>Chapter 2</ce:title>
<content>Welcome to Chapter 2</content>
</page>
</page-info>
<page-fulltext>Published in page 3</page-fulltext>
</pages>
</book>
</item>