I have the following xml file
<p style="1">
A
</p>
<div xml:lang="unknown">
<p style="3">
B
C
</div>
<div xml:lang="English">
<p style="5">
D
</p>
<p style="1">
Picture number 3?
</p>
and I just want to get the text between <div xml:lang="unknown">
and </div>
.
So I've tried this code :
import os, re
html = open("2.xml", "r")
text = html.read()
lon = re.compile(r'<div xml:lang="unknown">\n(.+)\n</div>', re.MULTILINE)
lon = lon.search(text).group(1)
print lon
but It doesn't seem to work.