-1

How can I extract string1#string2 from the bellow line?

<![CDATA[<html><body><p style="margin:0;">string1#string2</p></body></html>]]>

The # character and the structure of the line is always the same.

Ciprian Vintea
  • 448
  • 2
  • 4
  • 18

4 Answers4

1

I would like to refer you to this gem:

In synthesis a regex is not the appropriate tool for this job
Also have you tried an XML parser instead?

EDIT:

import xml.etree.ElementTree as ET
a = "<html><body><p style=\"margin:0;\">string1#string2</p></body></html>"
root = ET.fromstring(a)
c = root[0][0].text

OUT:
c
'string1#string2'

d = c.replace('#', ' ').split()
Out: 
d 
['string1', 'string2']
Community
  • 1
  • 1
SerialDev
  • 2,777
  • 20
  • 34
1

Simple, buggy, not reliable:

line.replace('<![CDATA[<html><body><p style="margin:0;">', "").replace('</p></body></html>]]>', "").split("#")
unddoch
  • 5,790
  • 1
  • 24
  • 37
1
re.search(r'[^>]+#[^<]+',s).group()
zxy
  • 148
  • 1
  • 2
0

If you wish to use a regex:

>>> re.search(r"<p.*?>(.+?)</p>", txt).group(1)
'string1#string2'
donkopotamus
  • 22,114
  • 2
  • 48
  • 60