I'm having issues scraping data from a website. The issue might be with Visual Studio Code, I am using the "Code Runner" extension. This is my first time using Beautiful Soup and Selenium so the issue might also be with my code. I started last Friday and after some difficulty came up with a solution on Saturday. My code is:
import requests
from bs4 import BeautifulSoup, SoupStrainer
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
parcelID = 1014100000 #this is a random parcelID I grabbed from the site
url = 'https://www.manateepao.com/parcel/?parid={}'.format(parcelID)
driver = webdriver.Chrome()
driver.get(url)
html = driver.execute_script("return document.documentElement.outerHTML")
#was getting encoding error with print(html). replaced character that was giving me trouble
newHTML = html.replace(u"\u2715", "*")
soupFilter = SoupStrainer('div', {'id': 'ownerContentScollContainer'})
soup = BeautifulSoup(newHTML, 'html.parser', parse_only=soupFilter)
webparcelID = soup.find_all('b')
lColumn = soup.find_all('div', {'class' : 'col-sm-2 m-0 p-0 text-sm-right'})
rColumn = soup.find_all('div', {'class' : 'col-sm m-0 p-0 ml-2'})
parcel_Dict = {}
for i in range(len(lColumn)):
parcel_Dict[i] = {lColumn[i].string: rColumn[i].string}
#This is to test if I got any results or not
print(parcel_Dict)
driver.close()
driver.quit()
What I am hoping to find each time I scrape a page is:
- The Parcel ID. This is in its own bold, b, tag.
- The Ownership and Mailing Address. The Ownership should always be at parcel_Dict[1] and the mailing address should always be at parcel_Dict[3].
I run the code and sometimes I get a result, and other times I get an empty dictionary.
Thank you for any help you can provide.