1

hey guys does beautifulSoup strips css and javascript content? after using

content3 = ''.join(BeautifulSoup(content).findAll(text=True))

i still have them lingering around.

goh
  • 27,631
  • 28
  • 89
  • 151

1 Answers1

0

What exactly do you want to strip, all script and style elements? It should be something like:

''.join(BeautifulSoup(content).findAll(text=lambda text: 
text.parent.name != "script" and 
text.parent.name != "style"))
Matthew Flaschen
  • 278,309
  • 50
  • 514
  • 539
  • thats right, probably a regex replace could do that, but i was wondering if beautifulsoup handles tthat. Or does the "simple version of webstemmer" could do that too? – goh Jun 09 '10 at 01:42