Regular Expressions(extraction values)

Question

I want to extract content, between a certain text.
For example:
<html><title>lol</title></html> I want to extract what is located between the <title> </ title>, which regular expression do I need ?

If it is about html content, why not use a library like [BeautifulSoup4](http://www.crummy.com/software/BeautifulSoup/bs4/doc/) — Vikas Ojha, Jun 09 '15 at 10:01
Using regular expressions to parse HTML (except in the most constrained circumstances) has [unfortunate effects](http://stackoverflow.com/a/1732454/67392). — Richard, Jun 09 '15 at 10:03
http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags — jayelm, Jun 09 '15 at 10:04
Have you glanced at other posts on the same topic? You could just modify a regular expression to your needs. For example [this](http://stackoverflow.com/questions/15033905/regex-that-extracts-text-between-tags-but-not-the-tags) post. — LordTribual, Jun 09 '15 at 10:08

DanteVoronoi · Answer 1 · 2015-06-09T11:51:25.837

2

You can use better tools than regular expressions. Read about HTMLParser

EDIT: But if you want use regular expressions:

import re

def get_tag_body(tagname, text):
    regexp = r'<%s>(.*?)</%s>' % (tagname, tagname)
    rx_obj = re.search(regexp, text, re.IGNORECASE|re.DOTALL)
    return rx_obj.groups()

edited Jun 09 '15 at 11:51

answered Jun 09 '15 at 10:02

DanteVoronoi

1,133
1
13
20

Regular Expressions(extraction values)

1 Answers1