2

This is my string :

content = '<tr class="cart-subtotal"><th>RTO / Registration office :</th><td><span class="amount"><h5>Yadgiri</h5></span></td></tr>'

I have tried below regular expression to extract the text which is in between h5 element tag:

   reg = re.search(r'<tr class="cart-subtotal"><th>RTO / Registration office :</th><td><span class="amount"><h5>([A-Za-z0-9%s]+)</h5></span></td></tr>' % string.punctuation,content)

It's exactly returns what I wants.

Is there any more pythonic way to get this one ?

Vikas Periyadath
  • 3,088
  • 1
  • 21
  • 33
Veera Balla Deva
  • 790
  • 6
  • 19

1 Answers1

2

Dunno whether this qualifies as more pythonic or not, but it handles it as HTML data.

from lxml import html
content = '<tr class="cart-subtotal"><th>RTO / Registration office :</th><td><span class="amount"><h5>Yadgiri</h5></span></td></tr>'
HtmlData = html.fromstring(content)
ListData = HtmlData.xpath(‘//text()’)

And to get the last element:

ListData[-1]
Srevilo
  • 174
  • 1
  • 11