1

So I'm fairly new to coding and I am supposed to be parsing Yelp reviews so I can analyze the data using Pandas. I have been trying to use selenium/beautifulsoup to automate the whole process, but I can't get past the webdriver/chromedriver errors in each version of the code I make.

!pip install selenium
from selenium import webdriver
from bs4 import BeautifulSoup
import pandas as pd
import os

# Set the path to the ChromeDriver executable
chromedriver_path = "C:\\Users\\5mxz2\\Downloads\\chromedriver\\chromedriver"

# Set the URL of the Yelp page you want to scrape
url = "https://www.yelp.com/biz/gelati-celesti-virginia-beach-2"

# Set the options for Chrome
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument("--headless")  # Run Chrome in headless mode, comment this line if you want to see the browser window

# Create the ChromeDriver instance
driver = webdriver.Chrome(executable_path=chromedriver_path, options=chrome_options)

# Load the Yelp page
driver.get(url)

# Extract the page source and pass it to BeautifulSoup
soup = BeautifulSoup(driver.page_source, "html.parser")

# Find all review elements on the page
reviews = soup.find_all("div", class_="review")

# Create empty lists to store the extracted data
review_texts = []
ratings = []
dates = []

# Iterate over each review element
for review in reviews:
    # Extract the review text
    review_text = review.find("p", class_="comment").get_text()
    review_texts.append(review_text.strip())

    # Extract the rating
    rating = review.find("div", class_="rating").get("aria-label")
    ratings.append(rating)

    # Extract the date
    date = review.find("span", class_="rating-qualifier").get_text()
    dates.append(date.strip())

# Create a DataFrame from the extracted data
data = {
    "Review Text": review_texts,
    "Rating": ratings,
    "Date": dates
}
df = pd.DataFrame(data)

# Print the DataFrame
print(df)

# Get the current working directory
path = os.getcwd()

# Save the DataFrame as a CSV file
csv_path = os.path.join(path, "yelp_reviews.csv")
df.to_csv(csv_path, index=False)

# Close the ChromeDriver instance
driver.quit()

That's what I have so far but I keep getting this error message

TypeError                                 Traceback (most recent call last)
<ipython-input-4-5712027ca0bf> in <cell line: 18>()
     16 
     17 # Create the ChromeDriver instance
---> 18 driver = webdriver.Chrome(executable_path=chromedriver_path, options=chrome_options)
     19 
     20 # Load the Yelp page

TypeError: WebDriver.__init__() got an unexpected keyword argument 'executable_path'

Can someone help me fix this please? And if anyone has any advice regarding the task as a whole, please let me know.

Y0hno
  • 17
  • 1
  • 6

1 Answers1

3

This is due to changes in selenium 4.10.0: https://github.com/SeleniumHQ/selenium/commit/9f5801c82fb3be3d5850707c46c3f8176e3ccd8e

Changes_in_selenium_4_10_0

Note that executable_path was removed.

If you want to pass in an executable_path, you'll have to use the service arg now.

from selenium import webdriver
from selenium.webdriver.chrome.service import Service

service = Service(executable_path='./chromedriver.exe')
options = webdriver.ChromeOptions()
driver = webdriver.Chrome(service=service, options=options)
# ...
driver.quit()
Michael Mintz
  • 9,007
  • 6
  • 31
  • 48
  • thank you for telling me about that, but it seems I've run into another problem. I can't get past the chrome binary path issue https://stackoverflow.com/q/76575713/22141623 – Y0hno Jun 28 '23 at 18:11
  • 1
    If you remove the entire `executable_path='./chromedriver.exe'` part so that it's just `service = Service()`, then selenium will now automatically take care of the path for you. – Michael Mintz Jun 28 '23 at 18:16
  • I appreciate your help. I was able to get around the webdriver/chrome binary path steps by using my local machine, but now I seem to be running into a new problem. https://stackoverflow.com/q/76583472/22141623 – Y0hno Jun 29 '23 at 17:52