0

I want to search the html divs starting with different ids but the same classes:

<div id="alabama" class="sc-fzoxKX fmCwKG state-entry">
<div id="alaska" class="sc-fzoxKX fmCwKG state-entry">

I tried to use

containers = page_soup.findAll("div", {"class":"sc-fzoxKX fmCwKG state-entry"})

But when I tested by writing len(containers), it returns 0. I also tried containers[0], but it returns an index out of range error.

Could anyone offer me some insight on how I can search through the list?

SMAKSS
  • 9,606
  • 3
  • 19
  • 34
  • Does this answer your question? [How to find elements by class](https://stackoverflow.com/questions/5041008/how-to-find-elements-by-class) – Jack Henahan May 24 '20 at 20:23

1 Answers1

1

You should use class_ as a parameter to find_all when looking for multiple classes. The full functional code:

from bs4 import BeautifulSoup

htmltxt = '<div id="alabama" class="sc-fzoxKX fmCwKG state-entry"></div><div id="alaska" class="sc-fzoxKX fmCwKG state-entry"></div>'
page_soup = BeautifulSoup(htmltxt, 'html.parser')
container = page_soup.find_all("div", class_ = "sc-fzoxKX fmCwKG state-entry")

print(len(container)) # Gives 2
print(container) # Gives the two divs

# To get the respective ids of all the divs:
for div in container:
  print(div.get('id'))

Also see: Difference between "findAll" and "find_all" in BeautifulSoup.

If you are using Beautiful Soup 3, which you shouldn't (update it to version 4), find_all will not work and you would have to use findAll as you have done in the original code. But, both function names work for bs4.

P.S. I added the closing </div> tags to both of your divs.

tanmay_garg
  • 377
  • 1
  • 13
  • '''len(container)''' still returns 0 :'( – mayiango May 24 '20 at 20:36
  • Oh, I have edited my post. Please check if that is your issue. Which version of Beautiful Soup are you using? – tanmay_garg May 24 '20 at 20:37
  • Unfortunately, len(container) still returns 0. I think it is a problem of the command I put into findAll in general. Should I address id? or is putting class enough? – mayiango May 24 '20 at 20:46
  • @mayiango I put the full code in my answer and tested it too. Double check once please and use `find_all` if you aren't. – tanmay_garg May 24 '20 at 20:48
  • This code works on your piece! :) However, when I designate page_soup to the website: https://www.cnn.com/interactive/2020/us/states-reopen-coronavirus-trnd/ The code does not work again. Would you help me take a look at the html code in this website? I think that the code fails may be due to the html structure of this particular website. I'm looking into extracting info under each state. Thank you in advance! – mayiango May 24 '20 at 21:07
  • Okay, so I went through the website. The reason it isn't working is because these two divs are being generated as the page loads. So, what bs4 gets is the source code which you can see after doing Ctrl+U (Chrome) which does not have these divs. But, you can see them when doing inspect. For this type of problem, have a look at the `selenium` library in python. It lets you parse the webpage after it has loaded. – tanmay_garg May 24 '20 at 21:11
  • Thank you so much! I'm taking a look into it :) – mayiango May 25 '20 at 20:34