0

I took a Python class my junior year of college but have forgotten a lot. For work I was asked to try to find a way to web scrape some date from a website. I have a python file that does something similar for a different site I use. Here is that code:

from bs4 import BeautifulSoup
import io
import requests

soup = 
BeautifulSoup(requests.get("https://servicenet.dewalt.com/Parts/Search?searchedNumber=N365763").content)

rows = soup.select("#customerList tbody tr")
with io.open("data.txt", "w", encoding="utf-8") as f:
   f.write(u", ".join([row.select_one("td a").text for row in rows]))

This gets a list of model numbers for power tool parts for that site. Now I basically want to do the same thing but I don't know where to begin. The site is https://www.powertoolreplacementparts.com/briggs-stratton-part-finder/#/s/BRG//498260/1/y

You click on the "Where Used" button and then there is a list of model numbers "093412-0011-01", "093412-0011-02", etc. I want those numbers to be sent to a text file separated by commas just like in my first code ("093412-0011-01, 093412-0011-02,...") Any help is much appreciated. Thanks!

Ali
  • 1,357
  • 2
  • 12
  • 18
Thomas
  • 49
  • 7

2 Answers2

3

I used selenium to be able to navigate pages.

Code:

import io
import time
from selenium import webdriver
from bs4 import BeautifulSoup
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

# Selenium Intializations
driver = webdriver.Chrome()
driver.get('https://www.powertoolreplacementparts.com/briggs-stratton-part-finder/#/s/BRG//498260/1/y')
wait = WebDriverWait(driver, 30)
driver.maximize_window()

# Locating the "Where Used" Button
driver.find_element_by_xpath("//input[@id='aripartsSearch_whereUsedBtn_0'][@class='ariPartListWhereUsed ariImageOverride'][@title='Where Used']").click()
wait.until(EC.visibility_of_element_located((By.XPATH, '//*[@id="ari_searchResults_Grid"]/ul')))


# Intializing BS4 and looking for the "Show More" Button
soup = BeautifulSoup(driver.page_source, "html.parser")
show = soup.find('li', {'class': 'ari-search-showMore'})

# Keep clicking the "Show More" Button until it is not visible anymore
while not show is None:
    time.sleep(2)
    hidden_element = driver.find_element_by_css_selector('#ari-showMore-unhide')
    if hidden_element.is_displayed():
        print("Element found")
        driver.find_element_by_css_selector('#ari-showMore-unhide').click()
        show = soup.find('li', {'class': 'ari-search-showMore'})
    else:
        print("Element not found")
        break

# Write the data parsed to the text file "data.txt"
with io.open("data.txt", "w", encoding="utf-8") as f:
    rows = soup.findAll('li', {'class': 'ari-ModelByPrompt'})
    for row in rows:
        part = str(row.text).replace(" ", "").replace("\n", "")
        print(part)
        f.write(part + ",")

Output:

Element found
Element found
Element found
Element not found
093412-0011-01
093412-0011-02
093412-0015-01
093412-0039-01
093412-0060-01
093412-0136-01
093412-0136-02
093412-0139-01
093412-0150-01
093412-0153-01
093412-0154-01
093412-0169-01
093412-0169-02
093412-0172-01
093412-0174-01
093412-0315-A1
093412-0339-A1
093412-0360-A1
093412-0636-A1
093412-0669-A1
093412-1015-E1
093412-1039-E1
093412-1060-E1
093412-1236-E1
093412-1236-E2
093412-1253-E1
093412-1254-E1
093412-1269-E1
093412-1274-E1
093412-1278-E1
093432-0035-01
093432-0035-02
093432-0035-03
093432-0036-01
093432-0036-03
093432-0036-04
093432-0037-01
093432-0038-01
093432-0038-03
093432-0041-01
093432-0140-01
093432-0145-01
093432-0149-01
093432-0152-01
093432-0157-01
093432-0158-01
093432-0160-01
093432-0192-B1
093432-0335-A1
093432-0336-A1
093432-0337-A1
093432-0338-A1
093432-1035-B1
093432-1035-E1
093432-1035-E2
093432-1035-E4
093432-1036-B1
093432-1036-E1
093432-1037-E1
093432-1038-B1
093432-1038-E1
093432-1240-B1
093432-1240-E1
093432-1257-E1
093432-1258-E1
093432-1280-B1
093432-1280-E1
093432-1281-B1
093432-1281-E1
093432-1282-B1
093432-1282-E1
093432-1286-B1
093452-0049-01
093452-0141-01
093452-0168-01
093452-0349-A1
093452-1049-B1
093452-1049-E1
093452-1049-E5
093452-1241-E1
093452-1242-E1
093452-1277-E1
093452-1283-B1
093452-1283-E1
09A412-0267-E1
09A413-0201-E1
09A413-0202-E1
09A413-0202-E2
09A413-0202-E3
09A413-0203-E1
09A413-0522-E1
09K432-0022-01
09K432-0023-01
09K432-0024-01
09K432-0115-01
09K432-0116-01
09K432-0116-02
09K432-0117-01
09K432-0118-01
120502-0015-E1

Content of the file:

093412-0011-01,093412-0011-02,093412-0015-01,093412-0039-01,093412-0060-01,093412-0136-01,093412-0136-02,093412-0139-01,093412-0150-01,093412-0153-01,093412-0154-01,093412-0169-01,093412-0169-02,093412-0172-01,093412-0174-01,093412-0315-A1,093412-0339-A1,093412-0360-A1,093412-0636-A1,093412-0669-A1,093412-1015-E1,093412-1039-E1,093412-1060-E1,093412-1236-E1,093412-1236-E2,093412-1253-E1,093412-1254-E1,093412-1269-E1,093412-1274-E1,093412-1278-E1,093432-0035-01,093432-0035-02,093432-0035-03,093432-0036-01,093432-0036-03,093432-0036-04,093432-0037-01,093432-0038-01,093432-0038-03,093432-0041-01,093432-0140-01,093432-0145-01,093432-0149-01,093432-0152-01,093432-0157-01,093432-0158-01,093432-0160-01,093432-0192-B1,093432-0335-A1,093432-0336-A1,093432-0337-A1,093432-0338-A1,093432-1035-B1,093432-1035-E1,093432-1035-E2,093432-1035-E4,093432-1036-B1,093432-1036-E1,093432-1037-E1,093432-1038-B1,093432-1038-E1,093432-1240-B1,093432-1240-E1,093432-1257-E1,093432-1258-E1,093432-1280-B1,093432-1280-E1,093432-1281-B1,093432-1281-E1,093432-1282-B1,093432-1282-E1,093432-1286-B1,093452-0049-01,093452-0141-01,093452-0168-01,093452-0349-A1,093452-1049-B1,093452-1049-E1,093452-1049-E5,093452-1241-E1,093452-1242-E1,093452-1277-E1,093452-1283-B1,093452-1283-E1,09A412-0267-E1,09A413-0201-E1,09A413-0202-E1,09A413-0202-E2,09A413-0202-E3,09A413-0203-E1,09A413-0522-E1,09K432-0022-01,09K432-0023-01,09K432-0024-01,09K432-0115-01,09K432-0116-01,09K432-0116-02,09K432-0117-01,09K432-0118-01,120502-0015-E1,
Ali
  • 1,357
  • 2
  • 12
  • 18
  • That's exactly what I want but when I run your code, I'm getting an error that says: "FileNotFoundError: [WinError 2] The system cannot find the file specified" and another that says "selenium.common.exceptions.WebDriverException: Message: 'chromedriver' executable needs to be in PATH." – Thomas Sep 25 '17 at 14:27
  • I figured I need to install the packages "webdriver" and "time" but when I try to do that I get an error that says "Could not find a version that satisfies the requirement time (from versions: ) No matching distribution found for time" All of the other packages are updated. – Thomas Sep 25 '17 at 14:31
  • Hi Thomas, I believe the "time" library comes builtin with python 3. The only thing you will need to install is selenium. You will need to use PIP to install selenium "PIP install selenium". Also you will need to download the chromedriver from this [link](https://sites.google.com/a/chromium.org/chromedriver/downloads) and add the executable to your path (I usually add it in the project directory). Refer to this [link](http://selenium-python.readthedocs.io/installation.html#) for an overall step by step instruction. – Ali Sep 25 '17 at 16:22
  • Ali, I added the file to C:\Users\Thomas\PycharmProjects\PowerToolSuperstore but I'm still getting the error "FileNotFoundError: [WinError 2] The system cannot find the file specified. Did I put it in the wrong place? – Thomas Sep 25 '17 at 19:50
  • Are you sure you downloaded the windows versions of the [chromedriver](https://chromedriver.storage.googleapis.com/2.32/chromedriver_win32.zip) executable and extracted chromedriver.exe from chromedriver_win32.zip in the your project location? Also please refer to this [answer](https://stackoverflow.com/questions/29858752/error-message-chromedriver-executable-needs-to-be-available-in-the-path). – Ali Sep 25 '17 at 20:01
  • Got it. Awesome - it works perfectly! Thanks for the help! – Thomas Sep 25 '17 at 21:05
1

1) Open chrome to https://www.powertoolreplacementparts.com/briggs-stratton-part-finder/#/s/BRG//498260/1/y

2) open network tab

3) click on "Where used"

4) See API call to endpoint 'GetModelSearchModelsForPrompt'

5) Copy url https://partstream.arinet.com/Search/GetModelSearchModelsForPrompt?cb=jsonp1506134982932&arib=BRG&arisku=498260&modelName=&responsive=true&arik=AjydG6MJi4Y9noWP0hFB&aril=en-US&ariv=https%253A%252F%252Fwww.powertoolreplacementparts.com%252Fbriggs-stratton-part-finder%252F

6) Open that with requests, you will need some clever thinking to parse that because they are returning HTML in "JSON".