Goal: To perform a web search that looks up a business and from the results, looks for either a "Permanently Closed" text or "Open" with hours or basically anything BUT "Permanently closed."
Problem: I'm using BeautifulSoup to parse the search results, but it only seems to find the correct element by class 50% of the time.
import urllib as u
from bs4 import BeautifulSoup as bs
import time
from PIL import Image
from io import BytesIO, StringIO
comp = pandas.DataFrame(data=[['ALL CITY FITNESS 2', '1005 E PESCADERO AVE SITE 211', 'TRACY', 'CA', '', '']],
columns=['NAME','ADDRESS','CITY','STATE','VERIFIED','STATUS'])
for i in comp.index:
if comp.loc[i, 'VERIFIED'] != 'YES':
location, address, city, state = comp.loc[i, ['NAME', 'ADDRESS', 'CITY', 'STATE']]
print(location, address, city, state)
search_string = f'{location} {address} {city}, {state}'
# search_html = Str(search_string).htmlconvert() # This is a custom function
search_html = 'ALL%20CITY%20FITNESS%202%201005%20E%20PESCADERO%20AVE%20SITE%20211%20TRACY%2C%20CA'
url = f'https://www.bing.com/search?q={search_html}'
try:
req = u.request.urlopen(url)
soup = bs(req, "xml")
# This checks if there is a Permanently Closed indicator on the page
# This works pretty consistently
for item in soup.find_all(class_='b_alert'):
print(item.text)
# Mark Location as closed
comp.loc[i, 'STATUS'] = 'INACTIVE'
else:
# This however, and the one below it rarely work
for check in soup.find_all(class_='e_green b_positive'):
print(check.text)
for check in soup.find_all('span', class_='e_green b_positive'):
print(check.text)
comp.loc[i, 'VERIFIED'] = 'YES'
time.sleep(3)
except Exception as e:
errors.append([i, search_string, e])
print(comp)
I performed this search manually and inspected the element, which is where I retrieved this class name. I've tried adding the '.' so that it was 'e_green.b_positive' and also removed it, as shown above. Neither seem to work, or at least don't work 100% of the time. What do I have wrong with my syntax where this is getting missed?