0

It's my first time working with Selenium and web scraping. I have been trying to get the menu item and prices for a certain restaurant in California from the following website (https://www.fastfoodmenuprices.com/baskin-robbins-prices/). I have been able to use Selenium to get to make it select California from the dropdown menu but I keep running into the problem of not being able to scrape the menu items and prices and coming up with a blank data frame. How do I scrape the menu item and prices from the following website and store them into a data frame? The code is below:

from selenium import webdriver
import time 
from selenium.webdriver.support.ui import Select
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait 
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
import pandas as pd
from bs4 import BeautifulSoup

path = "/path/to/chromedriver"

driver = webdriver.Chrome(executable_path = path)
url = "https://www.fastfoodmenuprices.com/baskin-robbins-prices/"
driver.get(url)
Select(WebDriverWait(driver,20).until(EC.visibility_of_element_located((By.XPATH, "//select[@class='tp-variation']")))).select_by_value("MS4yOA==")

print(driver.page_source)
driver.quit

menu = []
prices = []

content = driver.page_source
soup = BeautifulSoup (content, features = "html.parser")

for element in soup.findAll('div', attrs = {'tbody': 'row-hover'}):
    menu = element.find ('td', attrs = {'class': "column-1"})
    prices = element.find('td', attrs = {'class':'column-3'})
    menu.append(menu.text)
    prices.append(prices.text)

df = pd.DataFrame({'Menu Item':menu, 'Prices':prices})
df
bkim274
  • 15
  • 4

3 Answers3

1

Try:

import requests
import base64
import pandas as pd
from bs4 import BeautifulSoup


url = "https://www.fastfoodmenuprices.com/baskin-robbins-prices/"
soup = BeautifulSoup(requests.get(url).content, "html.parser")

data = []
for td in soup.select(
    "tr:has(.column-1):has(.column-2):has(.column-3):has(input)"
):
    data.append(
        {
            "Type": td.find_previous(colspan="3").get_text(strip=True),
            "Food": td.select_one(".column-1").get_text(strip=True),
            "Size": td.select_one(".column-2").get_text(strip=True),
            "Price": float(
                td.select_one(".column-3").get_text(strip=True).strip("$")
            ),
        }
    )


adjust = soup.select_one('.tp-variation option:-soup-contains("California")')
adjust = float(base64.b64decode(adjust["value"]))

df = pd.DataFrame(data)
df["Price"] = (df["Price"] * adjust).round(2)

print(df)
df.to_csv("data.csv", index=False)

Prints:

Type Food Size Price
0 Soft ServeFlavors: Reese’s, Heath, Snickers, M&M’s, Oreo, Butterfinger, andChocolate Chip Cookie Dough Soft Serve Below Mini 2.8
1 Soft ServeFlavors: Reese’s, Heath, Snickers, M&M’s, Oreo, Butterfinger, andChocolate Chip Cookie Dough Soft Serve Below Small 4.84
2 Soft ServeFlavors: Reese’s, Heath, Snickers, M&M’s, Oreo, Butterfinger, andChocolate Chip Cookie Dough Soft Serve Below Medium 5.61
3 Soft ServeFlavors: Reese’s, Heath, Snickers, M&M’s, Oreo, Butterfinger, andChocolate Chip Cookie Dough Soft Serve Below Large 7.65
4 Soft ServeFlavors: Reese’s, Heath, Snickers, M&M’s, Oreo, Butterfinger, andChocolate Chip Cookie Dough Cups & Cones Kids 2.02
5 Soft ServeFlavors: Reese’s, Heath, Snickers, M&M’s, Oreo, Butterfinger, andChocolate Chip Cookie Dough Cups & Cones Regular 2.53
6 Soft ServeFlavors: Reese’s, Heath, Snickers, M&M’s, Oreo, Butterfinger, andChocolate Chip Cookie Dough Cups & Cones Large 3.81
7 Soft ServeFlavors: Reese’s, Heath, Snickers, M&M’s, Oreo, Butterfinger, andChocolate Chip Cookie Dough Parfaits Mini 2.8
8 Soft ServeFlavors: Reese’s, Heath, Snickers, M&M’s, Oreo, Butterfinger, andChocolate Chip Cookie Dough Parfaits Regular 6.39
9 Sundaes Banana Royale 7.03
10 Sundaes Brownie 7.03
11 Sundaes Banana Split 8.56
12 Sundaes Reese’s Peanut Butter Cup Sundae 7.67
13 Sundaes Chocolate Chip Cookie Dough Sundae 7.67
14 Sundaes Oreo® Layered Sundae 7.67
15 Sundaes Made with Snickers Sundae 7.67
16 Sundaes One Scoop Sundae 4.47
17 Sundaes Two Scoops Sundae 5.75
18 Sundaes Three Scoops Sundae 6.64
19 Sundaes Candy Topping 1.01
20 Sundaes Waffle Bowl 1.27
21 Ice Cream Kid’s Scoop 2.8
22 Ice Cream Single Scoop 3.57
23 Ice Cream Double Scoop 5.11
24 Ice Cream Regular Waffle Cone 1.27
25 Ice Cream Chocolate Waffle Cone 1.91
26 Ice Cream Fancy Waffle Cone 1.91
27 Beverages Cappuccino Blast Mini 4.72
28 Beverages Cappuccino Blast Small 6
29 Beverages Cappuccino Blast Medium 7.28
30 Beverages Cappuccino Blast Large 8.56
31 Beverages Iced Cappy Blast Mini 4.72
32 Beverages Iced Cappy Blast Small 6
33 Beverages Iced Cappy Blast Medium 7.28
34 Beverages Iced Cappy Blast Large 8.56
35 Beverages Add a Boost (Cappuccino or Iced Cappy Blast) 0.64
36 Beverages Smoothie Mini 4.72
37 Beverages Smoothie Small 6
38 Beverages Smoothie Medium 7.28
39 Beverages Smoothie Large 8.56
40 Beverages Shake Mini 4.72
41 Beverages Shake Small 6
42 Beverages Shake Medium 7.28
43 Beverages Shake Large 8.56
44 Ice Cream To Go Pre-Packed Quart 7.67
45 Ice Cream To Go Hand-Packed Pint 6.39
46 Ice Cream To Go Hand-Packed Quart 10.23
47 Ice Cream To Go Clown Cones 3.7

and creates data.csv (screenshot from LibreOffice):

enter image description here

Andrej Kesely
  • 168,389
  • 15
  • 48
  • 91
0

*The website is using cloudflare protection

https://www.fastfoodmenuprices.com/baskin-robbins-prices/ is using Cloudflare CDN/Proxy!

https://www.fastfoodmenuprices.com/baskin-robbins-prices/ is using Cloudflare SSL!

** So I have to use the following options to evade detection

options = webdriver.ChromeOptions()
options.add_argument("start-maximized")
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('excludeSwitches', ['enable-logging'])
options.add_experimental_option('useAutomationExtension', False)
options.add_argument('--disable-blink-features=AutomationControlled')

*** To select table tr, td,I use css selector which is more robust and flexible.

**** I have to use list and zip function in pandas DataFrame as it shows not the same shape.

***** I have to use try except as you will see that some menu items are missing

Script:

from selenium import webdriver
from selenium.webdriver.support.ui import Select
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.support.ui import WebDriverWait 
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
import pandas as pd
from bs4 import BeautifulSoup


options = webdriver.ChromeOptions()
options.add_argument("start-maximized")
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('excludeSwitches', ['enable-logging'])
options.add_experimental_option('useAutomationExtension', False)
options.add_argument('--disable-blink-features=AutomationControlled')

driver = webdriver.Chrome(ChromeDriverManager().install(),options=options)

url = "https://www.fastfoodmenuprices.com/baskin-robbins-prices/"
driver.get(url)

Select(WebDriverWait(driver,20).until(EC.visibility_of_element_located((By.XPATH, "//select[@class='tp-variation']")))).select_by_value("MS4yOA==")



price=[]
menu=[]
 
soup = BeautifulSoup (driver.page_source,"lxml")
driver.close()

for element in soup.select('#tablepress-34 tbody tr'):
    try:
        menus = element.select_one('td:nth-child(2)').text
        menu.append(menus)
    except:
        pass
    try:
        prices = element.select_one('td:nth-child(3) span').text
        price.append(prices)
    except:
        pass
   
 

df = pd.DataFrame(data=list(zip(price,menu)),columns=['price','menu'])
print(df)

webdriver-manager

Output:

    price      menu
0    $2.80     Mini
1    $4.84    Small
2    $5.61   Medium
3    $7.65    Large
4    $2.02     Kids
5    $2.53  Regular
6    $3.81    Large
7    $2.80     Mini
8    $6.39  Regular
9    $7.03
10   $7.03
11   $8.56
12   $7.67
13   $7.67
14   $7.67
15   $7.67
16   $4.47
17   $5.75
18   $6.64
19   $1.01
20   $1.27
21   $2.80
22   $3.57
23   $5.11
24   $1.27
25   $1.91
26   $1.91
27   $4.72     Mini
28   $6.00    Small
29   $7.28   Medium
30   $8.56    Large
31   $4.72     Mini
32   $6.00    Small
33   $7.28   Medium
34   $8.56    Large
35   $0.64
36   $4.72     Mini
37   $6.00    Small
38   $7.28   Medium
39   $8.56    Large
40   $4.72     Mini
41   $6.00    Small
42   $7.28   Medium
43   $8.56    Large
44   $7.67    Quart
45   $6.39     Pint
46  $10.23    Quart
47   $3.70
Md. Fazlul Hoque
  • 15,806
  • 5
  • 12
  • 32
0

Once you select California to extract the table contents within the website you need to induce WebDriverWait for the visibility_of_element_located() and using DataFrame from Pandas you can use the following Locator Strategies:

  • Code Block:

    from selenium import webdriver
    from selenium.webdriver.chrome.options import Options
    from selenium.webdriver.chrome.service import Service
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    import pandas as pd
    
    options = Options()
    options.add_argument("start-maximized")
    options.add_experimental_option("excludeSwitches", ["enable-automation"])
    options.add_experimental_option('excludeSwitches', ['enable-logging'])
    options.add_experimental_option('useAutomationExtension', False)
    options.add_argument('--disable-blink-features=AutomationControlled')
    s = Service('C:\\BrowserDrivers\\chromedriver.exe')
    driver = webdriver.Chrome(service=s, options=options)
    driver.get("https://www.fastfoodmenuprices.com/baskin-robbins-prices")
    Select(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//select[@class='tp-variation']")))).select_by_value("MS4yOA==")
    tabledata = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//table[@id='tablepress-34']"))).get_attribute("outerHTML")
    tabledf = pd.read_html(tabledata)
    print(tabledf)
    
  • Console Output:

    [                                                 Food  ...                                              Price
    0   Soft Serve Flavors: Reese’s, Heath, Snickers, ...  ...  Soft Serve Flavors: Reese’s, Heath, Snickers, ...
    1                                    Soft Serve Below  ...                                              $2.80
    2                                    Soft Serve Below  ...                                              $4.84
    3                                    Soft Serve Below  ...                                              $5.61
    4                                    Soft Serve Below  ...                                              $7.65
    5                                        Cups & Cones  ...                                              $2.02
    6                                        Cups & Cones  ...                                              $2.53
    7                                        Cups & Cones  ...                                              $3.81
    8                                            Parfaits  ...                                              $2.80
    9                                            Parfaits  ...                                              $6.39
    10                                            Sundaes  ...                                            Sundaes
    11                                      Banana Royale  ...                                              $7.03
    12                                            Brownie  ...                                              $7.03
    13                                       Banana Split  ...                                              $8.56
    14                   Reese’s Peanut Butter Cup Sundae  ...                                              $7.67
    15                 Chocolate Chip Cookie Dough Sundae  ...                                              $7.67
    16                               Oreo® Layered Sundae  ...                                              $7.67
    17                          Made with Snickers Sundae  ...                                              $7.67
    18                                   One Scoop Sundae  ...                                              $4.47
    19                                  Two Scoops Sundae  ...                                              $5.75
    20                                Three Scoops Sundae  ...                                              $6.64
    21                                      Candy Topping  ...                                              $1.01
    22                                        Waffle Bowl  ...                                              $1.27
    23                                          Ice Cream  ...                                          Ice Cream
    24                                        Kid’s Scoop  ...                                              $2.80
    25                                       Single Scoop  ...                                              $3.57
    26                                       Double Scoop  ...                                              $5.11
    27                                Regular Waffle Cone  ...                                              $1.27
    28                              Chocolate Waffle Cone  ...                                              $1.91
    29                                  Fancy Waffle Cone  ...                                              $1.91
    30                                          Beverages  ...                                          Beverages
    31                                   Cappuccino Blast  ...                                              $4.72
    32                                   Cappuccino Blast  ...                                              $6.00
    33                                   Cappuccino Blast  ...                                              $7.28
    34                                   Cappuccino Blast  ...                                              $8.56
    35                                   Iced Cappy Blast  ...                                              $4.72
    36                                   Iced Cappy Blast  ...                                              $6.00
    37                                   Iced Cappy Blast  ...                                              $7.28
    38                                   Iced Cappy Blast  ...                                              $8.56
    39       Add a Boost (Cappuccino or Iced Cappy Blast)  ...                                              $0.64
    40                                           Smoothie  ...                                              $4.72
    41                                           Smoothie  ...                                              $6.00
    42                                           Smoothie  ...                                              $7.28
    43                                           Smoothie  ...                                              $8.56
    44                                              Shake  ...                                              $4.72
    45                                              Shake  ...                                              $6.00
    46                                              Shake  ...                                              $7.28
    47                                              Shake  ...                                              $8.56
    48                                    Ice Cream To Go  ...                                    Ice Cream To Go
    49                                         Pre-Packed  ...                                              $7.67
    50                                        Hand-Packed  ...                                              $6.39
    51                                        Hand-Packed  ...                                             $10.23
    52                                        Clown Cones  ...                                              $3.70
    
    [53 rows x 3 columns]]
    
undetected Selenium
  • 183,867
  • 41
  • 278
  • 352