Regex Remove Markup Python

Question

Have a string:

myString = '<p>Phone Number:</p><p>706-878-8888</p>'

Trying to regex out all HTML tags, in this case Paragraphs.

Thanks!

Don't use Regex to parse (X)HTML. Use a parser. BeautifulSoup comes to mind. — g.d.d.c, Jan 30 '12 at 19:37
possible duplicate of [RegEx match open tags except XHTML self-contained tags](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags) — Hamish, Jan 30 '12 at 19:37
possible duplicate of http://stackoverflow.com/questions/8703017/remove-sub-string-by-using-python/8703078#8703078 — juliomalegria, Jan 30 '12 at 19:44
I would link directly to the answer of that question @Hamish: http://stackoverflow.com/a/1732454/147129 :-P — GaretJax, Jan 30 '12 at 19:45

juliomalegria · Answer 1 · 2012-01-30T19:43:07.650

2

Use re.sub:

>>> re.sub('<[^>]+>', '', '<p>Phone Number:</p><p>706-878-8888</p>')
'Phone Number:706-878-8888'

Using re is a good solution if you just want to remove tags. But, if you're want to do things a little bit more complicated (involving HTML parsing) I suggest you to look into BeautifulSoup.

edited Jan 30 '12 at 19:43

answered Jan 30 '12 at 19:37

juliomalegria

24,229
14
73
89

score 2 · Accepted Answer · answered Jan 30 '12 at 19:40

2

Using BeautifulSoup as pointed out by a comment:

>>> from BeautifulSoup import BeautifulSoup
>>> BeautifulSoup(myString).text
u'Phone Number:706-878-8888'

answered Jan 30 '12 at 19:40

jcollado

39,419
8
102
133

Perfect! I kept trying attribute 'string' instead of text. Much thanks! – Hikalea Jan 30 '12 at 19:44

Regex Remove Markup Python

2 Answers2