How to use BeautifulSoup to find specific class elements on a web page

Question

Goal: To perform a web search that looks up a business and from the results, looks for either a "Permanently Closed" text or "Open" with hours or basically anything BUT "Permanently closed."

Problem: I'm using BeautifulSoup to parse the search results, but it only seems to find the correct element by class 50% of the time.

import urllib as u
from bs4 import BeautifulSoup as bs
import time
from PIL import Image
from io import BytesIO, StringIO

comp = pandas.DataFrame(data=[['ALL CITY FITNESS 2', '1005 E PESCADERO AVE SITE 211', 'TRACY', 'CA', '', '']], 
                        columns=['NAME','ADDRESS','CITY','STATE','VERIFIED','STATUS'])

for i in comp.index:
    if comp.loc[i, 'VERIFIED'] != 'YES':
        location, address, city, state = comp.loc[i, ['NAME', 'ADDRESS', 'CITY', 'STATE']]
        print(location, address, city, state)
        search_string = f'{location} {address} {city}, {state}'
        # search_html = Str(search_string).htmlconvert() # This is a custom function
        search_html = 'ALL%20CITY%20FITNESS%202%201005%20E%20PESCADERO%20AVE%20SITE%20211%20TRACY%2C%20CA'
        url = f'https://www.bing.com/search?q={search_html}'

        try:
            req = u.request.urlopen(url)
            soup = bs(req, "xml")
            
            # This checks if there is a Permanently Closed indicator on the page
            # This works pretty consistently
            for item in soup.find_all(class_='b_alert'):
                print(item.text)
                # Mark Location as closed
                comp.loc[i, 'STATUS'] = 'INACTIVE'
            else:
                # This however, and the one below it rarely work
                for check in soup.find_all(class_='e_green b_positive'):
                    print(check.text)

                for check in soup.find_all('span', class_='e_green b_positive'):
                    print(check.text)

            comp.loc[i, 'VERIFIED'] = 'YES'
            time.sleep(3)

        except Exception as e:
            errors.append([i, search_string, e])
print(comp)

I performed this search manually and inspected the element, which is where I retrieved this class name. I've tried adding the '.' so that it was 'e_green.b_positive' and also removed it, as shown above. Neither seem to work, or at least don't work 100% of the time. What do I have wrong with my syntax where this is getting missed?

There's a good chance that the page(s) you're scraping are dynamic - i.e., constructed (at least in part) using Javascript. BeautifulSoup isn't always helpful in such cases. Try *selenium* — DarkKnight, Apr 01 '22 at 16:30

score 1 · Accepted Answer · answered Apr 01 '22 at 18:07

I'm not sure why this affects it but it actually has to do with how you're encoding your html, or rather the end format of your html that you're using to run the search.

Add '&qs=n&form=QBRE&=%25eManage%20Your%20Search%20History%25E&sp=-1&p' to the end of your url variable, and I bet your code will find those class items now.

How to use BeautifulSoup to find specific class elements on a web page

1 Answers1