How can I extract string1#string2 from the bellow line?
<![CDATA[<html><body><p style="margin:0;">string1#string2</p></body></html>]]>
The # character and the structure of the line is always the same.
How can I extract string1#string2 from the bellow line?
<![CDATA[<html><body><p style="margin:0;">string1#string2</p></body></html>]]>
The # character and the structure of the line is always the same.
I would like to refer you to this gem:
In synthesis a regex is not the appropriate tool for this job
Also have you tried an XML parser instead?
EDIT:
import xml.etree.ElementTree as ET
a = "<html><body><p style=\"margin:0;\">string1#string2</p></body></html>"
root = ET.fromstring(a)
c = root[0][0].text
OUT:
c
'string1#string2'
d = c.replace('#', ' ').split()
Out:
d
['string1', 'string2']
Simple, buggy, not reliable:
line.replace('<![CDATA[<html><body><p style="margin:0;">', "").replace('</p></body></html>]]>', "").split("#")
If you wish to use a regex:
>>> re.search(r"<p.*?>(.+?)</p>", txt).group(1)
'string1#string2'
','').replace('
]]>','')`. – Efferalgan Oct 06 '16 at 08:57