0

I have mainly used python for data anlaysis and new to scraping. I am trying to learn the BeautifulSoup package. I am having problems to make the following code work.

from urllib.request import urlopen
from bs4 import BeautifulSoup
html = urlopen('http://pythonscraping.com/pages/warandpeace.html')
bsobj = BeautifulSoup(html)
name_list = bsobj.findAll('span',{'class':'green'})

I am getting an empty list.

It's clear that the problem comes from the 4th line. I am not sure why. Everything is standard here. I don't know what went wrong.

bsobj.prettify() 

Returns ''

But when I do html.read(), I can see all the html code fine. The problem is not solved by the answers below. The problem clearly comes from line4. It doesn't matter if I use bsobj.findAll(), or bsobj.find_all(). They are equilvalent and as I mentioned, the bsobj.prettify() returns ''.

Yan Song
  • 2,285
  • 4
  • 18
  • 27

2 Answers2

0

I think that line should be bsobj = BeautifulSoup(html.read())

Sam Chats
  • 2,271
  • 1
  • 12
  • 34
0

The findall is wrong...

bsobj.find_all('span',{'class':'green'})

It returns

[<span class="green">Anna
 Pavlovna Scherer</span>, <span class="green">Empress Marya
 Fedorovna</span>, <span class="green">Prince Vasili Kuragin</span>, <span class="green">Anna Pavlovna</span>, <span class="green">St. Petersburg</span>, <span class="green">the prince</span>, <span class="green">Anna Pavlovna</span>, <span class="green">Anna Pavlovna</span>, <span class="green">the prince</span>, <span class="green">the prince</span>, <span class="green">the prince</span>, <span class="green">Prince Vasili</span>, <span class="green">Anna Pavlovna</span>, <span class="green">Anna Pavlovna</span>, <span class="green">the prince</span>, <span class="green">Wintzingerode</span>, <span class="green">King of Prussia</span>, <span class="green">le Vicomte de Mortemart</span>, <span class="green">Montmorencys</span>, <span class="green">Rohans</span>, <span class="green">Abbe Morio</span>, <span class="green">the Emperor</span>, <span class="green">the prince</span>, <span class="green">Prince Vasili</span>, <span class="green">Dowager Empress Marya Fedorovna</span>, <span class="green">the baron</span>, <span class="green">Anna Pavlovna</span>, <span class="green">the Empress</span>, <span class="green">the Empress</span>, <span class="green">Anna Pavlovna's</span>, <span class="green">Her Majesty</span>, <span class="green">Baron
 Funke</span>, <span class="green">The prince</span>, <span class="green">Anna
 Pavlovna</span>, <span class="green">the Empress</span>, <span class="green">The prince</span>, <span class="green">Anatole</span>, <span class="green">the prince</span>, <span class="green">The prince</span>, <span class="green">Anna
 Pavlovna</span>, <span class="green">Anna Pavlovna</span>]
Tim Seed
  • 5,119
  • 2
  • 30
  • 26