Below is a simple html segment to parse with beautifulsoup4 and I hope to extract the top level raw text hello.
mysoup = BeautifulSoup('<td>hello<script type="text/javascript">world</script></td>')
And I've tried several intuitive ways but without expected results:
mysoup.text # u'helloworld'
mysoup.contents # [<html><body><td>hello<script type="text/javascript">world</script></td></body></html>]
list(mysoup.strings) # [u'hello ', u'world']
So how to achieve this goal?