0

Why does this not work, when there is <br> in the text? I get an empty text.

opener = urllib2.build_opener()
opener.addheaders = [('User-agent', 'Mozilla/5.0')]
address = 'http://www.bbc.com'
response = opener.open(address)
html = response.read()
soup = BeautifulSoup(html)
snaptext = soup.find('p', attrs={'class': 'displaytext'})
print snaptext.string

An example, would be:

< p > blahblahblah< br/ >blah2blah2blah2< br/ >< p >

If there's a < br > in the text, the result is None

Padraic Cunningham
  • 176,452
  • 29
  • 245
  • 321
Dzrte4gle
  • 49
  • 5

1 Answers1

-2

As you can see here, br is not the issue, it's your use of .string, it will always return None, because it doesn't have attribute .string. You probably want to use .getText()

>>> x = bs.find('div', attrs={'id': 'forum-post-body-183'})
>>> x
<div class="j-comment-body forum-post-body u-typography-format text" id="forum-post-body-183" itemprop="text">
<p>Let's try it! I will only replace Sir Finley with Ysera for late game pressing (and ev. win condition).<br>In edit of this comment i would report about results in casual battles (for start).</br></p>
</div>
>>> x.string
>>> print(x.string)
None
>>> x.getText()
"\nLet's try it! I will only replace Sir Finley with Ysera for late game pressing (and ev. win condition).In edit of this comment i would report about results in casual battles (for start).\n"
elixenide
  • 44,308
  • 16
  • 74
  • 100
iScrE4m
  • 882
  • 1
  • 12
  • 31