-2

Possible Duplicate:
using python, Remove HTML tags/formatting from a string

I read in a HTML file:

fi = open("Tree.html", "r")
text = fi.read()

I want to delete the HTML header from the text:

text = re.sub("<head>.*?</head>", "", text)

Why does this not work?

Community
  • 1
  • 1
Neopugg
  • 27
  • 2
  • 4

1 Answers1

1

It looks like you're not catching newlines. You need to add the DOTALL flag.

text = re.sub("<head>.*?</head>", "", text, flags=re.DOTALL)
John Percival Hackworth
  • 11,395
  • 2
  • 29
  • 38