Why am I getting inconsistent results from web scraping?

Question

I'm having issues scraping data from a website. The issue might be with Visual Studio Code, I am using the "Code Runner" extension. This is my first time using Beautiful Soup and Selenium so the issue might also be with my code. I started last Friday and after some difficulty came up with a solution on Saturday. My code is:

import requests
from bs4 import BeautifulSoup, SoupStrainer
from selenium import webdriver
from selenium.webdriver.common.keys import Keys

parcelID = 1014100000 #this is a random parcelID I grabbed from the site
url = 'https://www.manateepao.com/parcel/?parid={}'.format(parcelID)

driver = webdriver.Chrome()
driver.get(url)

html = driver.execute_script("return document.documentElement.outerHTML")
#was getting encoding error with print(html). replaced character that was giving me trouble
newHTML = html.replace(u"\u2715", "*")

soupFilter = SoupStrainer('div', {'id': 'ownerContentScollContainer'})
soup = BeautifulSoup(newHTML, 'html.parser', parse_only=soupFilter)

webparcelID = soup.find_all('b')
lColumn = soup.find_all('div', {'class' : 'col-sm-2 m-0 p-0 text-sm-right'})
rColumn = soup.find_all('div', {'class' : 'col-sm m-0 p-0 ml-2'})

parcel_Dict = {}
for i in range(len(lColumn)):
    parcel_Dict[i] = {lColumn[i].string: rColumn[i].string}

#This is to test if I got any results or not
print(parcel_Dict)

driver.close()
driver.quit()

What I am hoping to find each time I scrape a page is:

The Parcel ID. This is in its own bold, b, tag.
The Ownership and Mailing Address. The Ownership should always be at parcel_Dict[1] and the mailing address should always be at parcel_Dict[3].

I run the code and sometimes I get a result, and other times I get an empty dictionary.

Thank you for any help you can provide.

score 1 · Accepted Answer · answered Apr 27 '21 at 18:51

1

I solved my own issue by adding the following lines of code

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

wait = WebDriverWait(driver, 10)
wait.until(EC.presence_of_element_located((By.XPATH, "//div[@id='ownerContentScollContainer']")))

I waited until the ownerContentScrollContainer was fully loaded before proceeding to execute the rest of the code.

This post and this post helped me figure out where I might be going wrong. I used this tutorial to figure out how to use the appropriate Xpath.

answered Apr 27 '21 at 18:51

Christopher Brown

33
3

I can't accept my answer for another two days. Should I just delete this question in its entirety? – Christopher Brown Apr 27 '21 at 18:52
If you want to delete your answer you can flag it and admins will delete it. I do not see any reason to flag it by myself, no significant problems. – vitaliis Apr 28 '21 at 05:21

Why am I getting inconsistent results from web scraping?

1 Answers1