1

I want to scrape the whole website with selenium. I got one class of a product name in amazon. I just want to get all the product names under one class name. Without manually copying any id's or XPATH's for each and every product. How to do that??

What i have tried with justdial.com which worked:

user_agent = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_3) AppleWebKit/537.36 (KHTML, like Gecko) ' \
             'Chrome/80.0.3987.132 Safari/537.36'

driver_exe = 'chromedriver'
options = ChromeOptions()
options.add_argument("--headless")
options.add_argument(f'user-agent={user_agent}')
driver = webdriver.Chrome(options=options)

driver.get("https://www.justdial.com/Bangalore/Bakeries")
x = driver.find_elements_by_class_name("store-name")

for i in x:
    print(i.text)

What i have tried with amazon.com

from selenium import webdriver
from selenium.webdriver.chrome.options import Options

user_agent = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_3) AppleWebKit/537.36 (KHTML, like Gecko) ' \
             'Chrome/80.0.3987.132 Safari/537.36'

driver_exe = 'chromedriver'
options = Options()
options.add_argument("--headless")
options.add_argument(f'user-agent={user_agent}')
driver = webdriver.Chrome(executable_path=r"C:\Users\intel\Downloads\Setups\chromedriver.exe", options=options)
driver.get("https://www.amazon.com/s?k=asus&rh=n%3A300189&nav_sdd=aps&pd_rd_r=58b28d7d-1955-433b-b33b-b1b5dcf1f522&pd_rd_w=MJzan&pd_rd_wg=QG3cj&pf_rd_p=6d81377b-6d6c-4363-ae02-8fa202ed7b50&pf_rd_r=X0BDDAPN7TTW0ZT1REX6&qid=1583290662&ref=sxwds-sbc_c2")
x = driver.find_elements_by_class_name("a-size-medium a-color-base a-text-normal")

for i in list(x):
    print(i.text.strip())

what it is showing the Output with amazon:

"A cookie associated with a cross-site resource at http://yahoo.com/ was set without the `SameSite` attribute. A future release of Chrome will only deliver cookies with cross-site requests if they are set with `SameSite=None` and `Secure`. You can review cookies in developer tools under Application>Storage>Cookies and see more details at https://www.chromestatus.com/feature/5088147346030592 and https://www.chromestatus.com/feature/5633521622188032.", source:
https://www.amazon.com/s?k=asus&rh=n%3A300189&nav_sdd=aps&pd_rd_r=58b28d7d-1955-433b-b33b-b1b5dcf1f522&pd_rd_w=MJzan&pd_rd_wg=QG3cj&pf_rd_p=6d81377b-6d6c-4363-ae02-8fa202ed7b50&pf_rd_r=X0BDDAPN7TTW0ZT1REX6&qid=1583290662&ref=sxwds-sbc_c2 (0)
[0306/071643.332:INFO:CONSOLE(0)] "A cookie associated with a cross-site resource at https://yahoo.com/ was set without the `SameSite` attribute. A future release of Chrome will only deliver cookies with cross-site requests if they are set with `SameSite=None` and `Secure`. You can review cookies in developer tools under Application>Storage>Cookies and see more details at https://www.chromestatus.com/feature/5088147346030592 and https://www.chromestatus.com/feature/5633521622188032.", source: https://www.amazon.com/s?k=asus&rh=n%3A300189&nav_sdd=aps&pd_rd_r=58b28d7d-1955-433b-b33b-b1b5dcf1f522&pd_rd_w=MJzan&pd_rd_wg=QG3cj&pf_rd_p=6d81377b-6d6c-4363-ae02-8fa202ed7b50&pf_rd_r=X0BDDAPN7TTW0ZT1REX6&qid=1583290662&ref=sxwds-sbc_c2 (0)
[0306/071644.820:INFO:CONSOLE(0)] "A cookie associated with a cross-site resource at https://surveywall-api.survata.com/ was set without the `SameSite` attribute. A future release of Chrome will only deliver cookies with cross-site requests if they are set with `SameSite=None` and `Secure`. You can review cookies in developer tools under Application>Storage>Cookies and see more details at https://www.chromestatus.com/feature/5088147346030592 and https://www.chromestatus.com/feature/5633521622188032.", source: https://www.amazon.com/s?k=asus&rh=n%3A300189&nav_sdd=aps&pd_rd_r=58b28d7d-1955-433b-b33b-b1b5dcf1f522&pd_rd_w=MJzan&pd_rd_wg=QG3cj&pf_rd_p=6d81377b-6d6c-4363-ae02-8fa202ed7b50&pf_rd_r=X0BDDAPN7TTW0ZT1REX6&qid=1583290662&ref=sxwds-sbc_c2 (0)

I just want one selenium script which will perform with any/most of the sites... Any help would be appreciated.

Abhay Salvi
  • 890
  • 3
  • 15
  • 39
  • 1
    Seemingly related: https://stackoverflow.com/questions/59787776/how-to-set-chrome-experimental-option-same-site-by-default-cookie-in-python-sele – AMC Mar 06 '20 at 02:48
  • Will this remove those lines out and will work if I applied the code to my code on Amazon ?? – Abhay Salvi Mar 06 '20 at 02:52

1 Answers1

1

.find_elements_by_class_name just for single class name, if multiple class name you can use .find_elements_by_css_selector.

Try this:

x = driver.find_elements_by_css_selector(".a-size-medium.a-color-base.a-text-normal")
frianH
  • 7,295
  • 6
  • 20
  • 45
  • Yeah it solved the majority of problems but sometimes it prints the same error message in between and then prints the rest of the product names. @frianH – Abhay Salvi Mar 06 '20 at 06:08
  • I mean the line under this : **what it is showing the Output with amazon:** in my question – Abhay Salvi Mar 06 '20 at 06:22
  • First it is just printing those CORS Policy lines, then when i copied your code it is still showing it two times but with scraped data also... – Abhay Salvi Mar 06 '20 at 06:23
  • 1
    @AndrewWhiteman sorry I don't know about the **CORS Policy**, I've tried and and didn't get that error message. Please create another thread to solve this issue. – frianH Mar 06 '20 at 07:05