-5

I want to extract content, between a certain text.
For example:
<html><title>lol</title></html> I want to extract what is located between the <title> </ title>, which regular expression do I need ?

Dexxxter
  • 5
  • 2
  • If it is about html content, why not use a library like [BeautifulSoup4](http://www.crummy.com/software/BeautifulSoup/bs4/doc/) – Vikas Ojha Jun 09 '15 at 10:01
  • Have you tried something? – Rakholiya Jenish Jun 09 '15 at 10:02
  • Using regular expressions to parse HTML (except in the most constrained circumstances) has [unfortunate effects](http://stackoverflow.com/a/1732454/67392). – Richard Jun 09 '15 at 10:03
  • http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags – jayelm Jun 09 '15 at 10:04
  • Have you glanced at other posts on the same topic? You could just modify a regular expression to your needs. For example [this](http://stackoverflow.com/questions/15033905/regex-that-extracts-text-between-tags-but-not-the-tags) post. – LordTribual Jun 09 '15 at 10:08

1 Answers1

2

You can use better tools than regular expressions. Read about HTMLParser

EDIT: But if you want use regular expressions:

import re

def get_tag_body(tagname, text):
    regexp = r'<%s>(.*?)</%s>' % (tagname, tagname)
    rx_obj = re.search(regexp, text, re.IGNORECASE|re.DOTALL)
    return rx_obj.groups()
DanteVoronoi
  • 1,133
  • 1
  • 13
  • 20