-2

Have a string:

myString = '<p>Phone Number:</p><p>706-878-8888</p>'

Trying to regex out all HTML tags, in this case Paragraphs.

Thanks!

Hikalea
  • 119
  • 2
  • 10
  • 4
    Don't use Regex to parse (X)HTML. Use a parser. BeautifulSoup comes to mind. – g.d.d.c Jan 30 '12 at 19:37
  • possible duplicate of [RegEx match open tags except XHTML self-contained tags](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags) – Hamish Jan 30 '12 at 19:37
  • possible duplicate of http://stackoverflow.com/questions/8703017/remove-sub-string-by-using-python/8703078#8703078 – juliomalegria Jan 30 '12 at 19:44
  • I would link directly to the answer of that question @Hamish: http://stackoverflow.com/a/1732454/147129 :-P – GaretJax Jan 30 '12 at 19:45

2 Answers2

2

Use re.sub:

>>> re.sub('<[^>]+>', '', '<p>Phone Number:</p><p>706-878-8888</p>')
'Phone Number:706-878-8888'

Using re is a good solution if you just want to remove tags. But, if you're want to do things a little bit more complicated (involving HTML parsing) I suggest you to look into BeautifulSoup.

juliomalegria
  • 24,229
  • 14
  • 73
  • 89
2

Using BeautifulSoup as pointed out by a comment:

>>> from BeautifulSoup import BeautifulSoup
>>> BeautifulSoup(myString).text
u'Phone Number:706-878-8888'
jcollado
  • 39,419
  • 8
  • 102
  • 133