I want to extract content, between a certain text.
For example:
<html><title>lol</title></html>
I want to extract what is located between the <title> </ title>
, which regular expression do I need ?
Asked
Active
Viewed 58 times
-5

Dexxxter
- 5
- 2
-
If it is about html content, why not use a library like [BeautifulSoup4](http://www.crummy.com/software/BeautifulSoup/bs4/doc/) – Vikas Ojha Jun 09 '15 at 10:01
-
Have you tried something? – Rakholiya Jenish Jun 09 '15 at 10:02
-
Using regular expressions to parse HTML (except in the most constrained circumstances) has [unfortunate effects](http://stackoverflow.com/a/1732454/67392). – Richard Jun 09 '15 at 10:03
-
http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags – jayelm Jun 09 '15 at 10:04
-
Have you glanced at other posts on the same topic? You could just modify a regular expression to your needs. For example [this](http://stackoverflow.com/questions/15033905/regex-that-extracts-text-between-tags-but-not-the-tags) post. – LordTribual Jun 09 '15 at 10:08
1 Answers
2
You can use better tools than regular expressions. Read about HTMLParser
EDIT:
But if you want use regular expressions:
import re
def get_tag_body(tagname, text):
regexp = r'<%s>(.*?)</%s>' % (tagname, tagname)
rx_obj = re.search(regexp, text, re.IGNORECASE|re.DOTALL)
return rx_obj.groups()

DanteVoronoi
- 1,133
- 1
- 13
- 20