0

im trying to get the cellphone/office phone number information off of this website: https://www.zillow.com/lender-profile/DougShoemaker/

ive tried playing around with bs4 but i can only get the first phone number. Im trying to get both office and cell numbers.

from selenium import webdriver
from bs4 import BeautifulSoup
import time


#Chrome webdriver filepath...Chromedriver version 74
driver = webdriver.Chrome(r'C:\Users\mfoytlin\Desktop\chromedriver.exe')
driver.get('https://www.zillow.com/lender-profile/DougShoemaker/')
soup = BeautifulSoup(driver.page_source, 'html.parser')
time.sleep(2)
phoneNum = driver.find_element_by_class_name('zsg-list_definition')
trial = phoneNum.find_element_by_class_name('zsg-sm-hide')
print(trial.text)
mcfoyt
  • 31
  • 7
  • What problem are you having? What happens when you try to get the office and cell numbers? Do you get an incorrect result, an empty result, or an error message? – John Gordon Jul 01 '19 at 19:43
  • i get the correct result for the first cell phone number, i just literally cannot figure out how to get the correct paths or searches to be able to get both phone numbers provided. The above code successfully finds and prints the first phone number but i cant get passed that...the way the tags are set up makes it tricky to get desired information @John Gordon – mcfoyt Jul 01 '19 at 19:48

3 Answers3

2

You don't have to use Selenium, or even BeautifulSoup. If you inspect network requests from Developer Tools (F12) > Network you can see that the data is fetched using an XHR request

enter image description here

You can make this request yourself and use the JSON response anyway you like.

POST https://mortgageapi.zillow.com/getRegisteredLender?partnerId=RD-CZMBMCZ
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Firefox/68.0
Referer: https://www.zillow.com/lender-profile/DougShoemaker/
Content-Type: application/json

{
  "fields": [
    "aboutMe",
    "address",
    "cellPhone",
    # ... other fields
    "website"
  ],
  "lenderRef": {
    "screenName": "DougShoemaker"
  }
}

Now, with requests library you can try:

import requests

if __name__ == '__main__':
    payload = {
        "fields": [
            "screenName",
            "cellPhone",
            "officePhone",
            "title",
        ],
        "lenderRef": {
            "screenName": "DougShoemaker"
        }
    }

    res = requests.post('https://mortgageapi.zillow.com/getRegisteredLender?partnerId=RD-CZMBMCZ',
                        json=payload)
    res.raise_for_status()
    data = res.json()

    cellphone, office_phone = data['lender']['cellPhone'], data['lender']['officePhone']
    cellphone_num = '({areaCode}) {prefix}-{number}'.format(**cellphone)
    office_phone_num = '({areaCode}) {prefix}-{number}'.format(**office_phone)
    print(office_phone_num, cellphone_num)

which prints:

(618) 619-4120 (618) 795-0790
abdusco
  • 9,700
  • 2
  • 27
  • 44
0

try following xpath for each phone numbers

Office Phone:
//dt[contains(text(),'Office')]/following-sibling::dd/div/span
Cell Phone:
//dt[contains(text(),'Cell')]/following-sibling::dd/div/span
Fax Number:
//dt[contains(text(),'Fax')]/following-sibling::dd/div/span
Sureshmani Kalirajan
  • 1,938
  • 2
  • 9
  • 18
0

To extract the Office, Cell and Fax number, you have to induce WebDriverWait for the visibility_of_element_located() and you can use either of the following Locator Strategies:

  • Code Block:

    from selenium import webdriver
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    
    options = webdriver.ChromeOptions()
    options.add_argument('start-maximized')
    # options.add_argument('disable-infobars')
    options.add_argument('--disable-extensions')
    driver = webdriver.Chrome(chrome_options=options, executable_path=r'C:\WebDrivers\chromedriver.exe')
    driver.get('https://www.zillow.com/lender-profile/DougShoemaker/')
    print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//dt[text()='Office']//following::dd[1]//span"))).get_attribute("innerHTML"))
    print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//dt[text()='Cell']//following::dd[1]//span"))).get_attribute("innerHTML"))
    print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//dt[text()='Fax']//following::dd[1]//span"))).get_attribute("innerHTML"))
    
  • Console Output:

    (618) 619-4120
    (618) 795-0790
    (618) 619-4120
    
undetected Selenium
  • 183,867
  • 41
  • 278
  • 352