1

I want to scrape the whole website with selenium. I got one class of a product name in a website. I just want to get all the product names under one class name. Without manually copying any id's or XPATH's for each and every product. I have done it by doing this but:

from selenium import webdriver
from selenium.webdriver.chrome.options import Options

driver_exe = 'chromedriver'
options = Options()
options.add_argument("--headless")
driver = webdriver.Chrome(executable_path=r"C:\Users\intel\Downloads\Setups\chromedriver.exe", options=options)

driver.get("https://www.justdial.com/Bangalore/Bakeries")
x = driver.find_elements_by_class_name("store-name")

for i in x:
    print(i.text)

It's not displaying anything. Why??? Any parsers like beautiful soup will also be accepted mixed with selenium but I want selenium anyways...

Abhay Salvi
  • 890
  • 3
  • 15
  • 39

1 Answers1

2

Bypass Access Denied in headless mode solution use different user-agent, reference. You can use your own user agent, google "my user agent" to get it.

user_agent = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_3) AppleWebKit/537.36 (KHTML, like Gecko) ' \
             'Chrome/80.0.3987.132 Safari/537.36'
driver_exe = 'chromedriver'
options = ChromeOptions()
options.add_argument("--headless")
options.add_argument(f'user-agent={user_agent}')
driver = webdriver.Chrome(options=options)

driver.get("https://www.justdial.com/Bangalore/Bakeries")
x = driver.find_elements_by_class_name("store-name")

for i in x:
    print(i.text)

Using and

from bs4 import BeautifulSoup
import requests

user_agent = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_3) AppleWebKit/537.36 (KHTML, like Gecko) ' \
             'Chrome/80.0.3987.132 Safari/537.36'

response = requests.get('https://www.justdial.com/Bangalore/Bakeries', headers={'user-agent': user_agent})
soup = BeautifulSoup(response.text, 'lxml')
stores = soup.select('.store-name')
for store in stores:
    print(store.text.strip())

Output:

Big Mishra Pedha
Just Bake
The Cake Factory
Queen Of Cakeland
SREENIVASA BRAHMINS BAKERY ..
Aubree Eat Play Love Chocol..
Jain Bakes
Ammas Pastries
Facebake
Holige Mane Brahmins Bakery
Sers
  • 12,047
  • 2
  • 12
  • 31
  • Well, cant we just mix this code with selenium, or can you correct my selenium code?? @Sers – Abhay Salvi Mar 05 '20 at 16:32
  • 1
    You can do all using requests - faster/lighter/better solution for scrap. For Selenium I update the answer – Sers Mar 05 '20 at 16:35
  • Well while doing it is showing me also ```"Access to image at 'https://akam.cdn.jdmagicbox.com/images/icontent/jdrwd/drop_down_arrow.svg' from origin 'https://www.justdial.com' has been blocked by CORS policy: No 'Access-Control-Allow-Origin' header is present on the requested resource.", source: https://www.justdial.com/Bangalore/Bakeries (0) [0305/223220.362:INFO:CONSOLE(425)] "1", source: https://static.127777.com/mnbldr/?b=jd_rwd/public/minjs&g=cmnjs,usrjs,envtjs,tranlatejs,rsljs&ver_=46.80 (425)``` – Abhay Salvi Mar 05 '20 at 17:01