-1

I am unable to scrape the "view details "button links as a list for the page "https://www.bmstores.co.uk/stores?location=KA8+9BF"..I have tried in both beautifulsoup and selenium in multiple ways.In terms of selenium i used, find element methods using x path and css selector class name but nothing worked.while using selenium got the pop up issue for the site but however it resolved using pop up blockers.

Searched in various sites but got the same beautifulsoup python codes but unable to complete the task. My code is here---when i run i get the 2 repeat errors

1.ElementNotInteractableException: element not interactable 2.NoSuchElementException: Message: no such element: Unable to locate element

My code is here--

from bs4 import BeautifulSoup
import requests
import pandas as pd
from selenium import webdriver as wd
import time
from selenium.common.exceptions import WebDriverException

local_path_of_chrome_driver = "E:\\chromedriver.exe"
driver = wd.Chrome(executable_path=local_path_of_chrome_driver)
driver.maximize_window()

data_links=[]

xpaths = 

["/html/body/div[9]/div/div/div/div/ul/li[1]/div/div[2]/a[1]","/html/body/div[9]/div/div/div/div/ul/li[2]/div/div[2]/a[1]","/html/body/div[9]/div/div/div/div/ul/li[4]/div/div[2]/a[1]","/html/body/div[9]/div/div/div/div/ul/li[5]/div/div[2]/a[1]"]
for j in xpaths:
        try:
            
            driver.find_element_by_xpath(j).click()
            
            time.sleep(3)
        
            driver.switch_to_window(driver.window_handles[-1])
            data_links.append(driver.current_url)
            
            time.sleep(3)
            
            driver.back()
        except:
            pass
            
 driver.close()

Can someone help me out?

2 Answers2

0

To scrape the View Details button links as a list from the page https://www.bmstores.co.uk/stores?location=KA8+9BF you have to induce WebDriverWait and you can use the following Locator Strategies:

  • Code Block:

    view_details = WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.LINK_TEXT, "View Details")))
    for i in view_details:
        print(i.get_attribute("href"))
    
  • Console output:

    https://www.bmstores.co.uk/stores/ayr-heathfield-retail-park-90
    https://www.bmstores.co.uk/stores/prestwick-113
    https://www.bmstores.co.uk/stores/irvine-307
    https://www.bmstores.co.uk/stores/kilmarnock-310
    https://www.bmstores.co.uk/stores/stevenston-319
    https://www.bmstores.co.uk/stores/darnley-414
    https://www.bmstores.co.uk/stores/east-kilbride-304
    https://www.bmstores.co.uk/stores/paisley-linwood-423
    https://www.bmstores.co.uk/stores/linwood-hart-street-33
    https://www.bmstores.co.uk/stores/paisley-renfrew-road-428
    
undetected Selenium
  • 183,867
  • 41
  • 278
  • 352
-1

You can fetch all the names and their concerning view details button link using requests module. There are 24 stores in total.

import requests
from urllib.parse import urljoin

base = 'https://www.bmstores.co.uk'
link = 'https://mv7e2a3yql-dsn.algolia.net/1/indexes/*/queries'

params = {
    'x-algolia-agent': 'Algolia for JavaScript (3.35.0); Browser; instantsearch.js (3.6.0); JS Helper (2.28.0)',
    'x-algolia-application-id': 'MV7E2A3YQL',
    'x-algolia-api-key': 'Mzg2ZjM2ZmVmNzhiMmVhZjhhNjQ5ZDAzNGQ5NjE2MTQ1MDQ2ZDAwODBlMjY2YjFkNWFkOTUyOTZkNTRhY2M4MmZpbHRlcnM9JTI4c3RhdHVzJTNBYXBwcm92ZWQlMjkrQU5EK3B1Ymxpc2hkYXRlKyUzQysxNjM1NTAzMzI5K0FORCslMjhleHBpcnlkYXRlKyUzRSsxNjM1NTAzMzI5K09SK2V4cGlyeWRhdGUrJTNEKy0xJTI5',
}

with requests.Session() as s:
    s.headers['User-Agent'] = 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.150 Safari/537.36'
    s.headers['Referer'] = 'https://www.bmstores.co.uk/stores?location=KA8+9BF'
    
    page = 0
    
    while page<=3:
        payload = {"requests":[{"indexName":"prod_bmstores_stores","params":f"query=&hitsPerPage=10&page={page}&attributesToRetrieve=*&highlightPreTag=__ais-highlight__&highlightPostTag=__%2Fais-highlight__&getRankingInfo=true&aroundLatLng=55.47888%2C-4.59464&aroundRadius=50000&clickAnalytics=false&facets=%5B%22ranges%22%5D&tagFilters="}]}
        res = s.post(link,params=params,json=payload)
        for item in res.json()['results']:
            for container in item['hits']:
                store_name = container['storename']
                detail_link = urljoin(base,container['url'])
                print(store_name,detail_link)

        page+=1
SIM
  • 21,997
  • 5
  • 37
  • 109