0

I want to use this regular expression in Python:

 <(?:"[^"]*"['"]*|'[^']*'['"]*|[^'">])+>

(from RegEx match open tags except XHTML self-contained tags)

def removeHtmlTags(page):
    p = re.compile(r'XXXX')
    return p.sub('', page)

It seems that I cannot directly substitute the complex regular expression into the above function.

Community
  • 1
  • 1
Yin Zhu
  • 16,980
  • 13
  • 75
  • 117

2 Answers2

3

Works fine here. You're probably having trouble because of the quotes. Just triple-quote it:

def removeHtmlTags(page):
    p = re.compile(r'''<(?:"[^"]*"['"]*|'[^']*'['"]*|[^'">])+>''')
    return p.sub('', page)
Ignacio Vazquez-Abrams
  • 776,304
  • 153
  • 1,341
  • 1,358
0

If you need to remove HTML tags, this should do it:

import re

def removeHtmlTags(page):
    pattern = re.compile(r'\<[^>]+\>', re.I)
    return pattern.sub('', page)
mcrisc
  • 809
  • 1
  • 9
  • 19
  • That wasn't the question, but the point of the original regex is to allow for angle brackets within attribute values. – Alan Moore Mar 10 '10 at 14:30