1

I'm trying to gather some information from certain webpages using selenium and python.I have a working code for a single tab. But now i have a situation where i need to open 50 tabs in chrome at once and process each page data.

1) So open 50 tabs at once - The code i got already 2) Change the control between tabs and process the information from the page and close the tab and move to next tab and do the same.

from selenium import webdriver 
from selenium.webdriver.common.by import By 
from selenium.webdriver.support.ui import WebDriverWait 
from selenium.webdriver.support import expected_conditions as EC 
from selenium.common.exceptions import TimeoutException
import psycopg2
import os
import datetime

final_results=[]
positions=[]
saerched_url=[]

options = webdriver.ChromeOptions()
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
#options.add_argument('--headless')
options.add_argument("—-incognito")
browser = webdriver.Chrome(executable_path='/users/user_123/downloads/chrome_driver/chromedriver', chrome_options=options)
browser.implicitly_wait(20)

#def db_connect():
try:
     DSN = "dbname='postgres' user='postgres' host='localhost' password='postgres' port='5432'"
     TABLE_NAME = 'staging.search_url'
     conn = psycopg2.connect(DSN)
     print("Database connected...")
     cur = conn.cursor()
     cur.execute("SET datestyle='German'")
except (Exception, psycopg2.Error) as error:
     print('database connection failed')
     quit()

def get_products(url):
    browser.get(url)
    names = browser.find_elements_by_xpath("//span[@class='pymv4e']")
    upd_product_name_list=list(filter(None, names))
    product_name = [x.text for x in upd_product_name_list]
    product = [x for x in product_name if len(x.strip()) > 2]
    upd_product_name_list.clear()
    product_name.clear()
    return product


links = ['https://www.google.com/search?q=Vitamin+D',
'https://www.google.com/search?q=Vitamin+D3',
'https://www.google.com/search?q=Vitamin+D+K2',
'https://www.google.com/search?q=D3',
'https://www.google.com/search?q=Vitamin+D+1000']

for link in links:
    # optional: we can wait for the new tab to open by comparing window handles count before & after
    tabs_count_before = len(browser.window_handles)

    # open a link
    control_string = "window.open('{0}')".format(link)
    browser.execute_script(control_string)

    # optional: wait for windows count to increment to ensure new tab is opened
    WebDriverWait(browser, 1).until(lambda browser: tabs_count_before != len(browser.window_handles))

    # get list of currently opened tabs
    tabs_list = browser.window_handles
    print(tabs_list)
    # switch control to newly opened tab (the last one in the list)
    last_tab_opened = tabs_list[len(tabs_list)-1]
    browser.switch_to_window(last_tab_opened)

    # now you can process data on the newly opened tab
    print(browser.title)


for lists in tabs_list:
    last_tab_opened = tabs_list[len(tabs_list)-1]
    browser.switch_to_window(last_tab_opened)
    filtered=[]
    filtered.clear()
    filtered = get_products(link)
    saerched_url.clear()
    if not filtered:
        new_url=link+'+kaufen'
        get_products(link) 
        print('Modified URL :'+link)

    if filtered:
        print(filtered)
        positions.clear()
        for x in range(1, len(filtered)+1):
            positions.append(str(x))
            saerched_url.append(link)

        gobal_position=0
        gobal_position=len(positions)
        print('global postion first: '+str(gobal_position))
        print("\n")

        company_name_list = browser.find_elements_by_xpath("//div[@class='LbUacb']")
        company = []
        company.clear()
        company = [x.text for x in company_name_list]
        print('Company Name:')
        print(company, '\n')


        price_list = browser.find_elements_by_xpath("//div[@class='e10twf T4OwTb']")
        price = []
        price.clear()
        price = [x.text for x in price_list]
        print('Price:')
        print(price)
        print("\n")

        urls=[]
        urls.clear()
        find_href = browser.find_elements_by_xpath("//a[@class='plantl pla-unit-single-clickable-target clickable-card']")
        for my_href in find_href:
            url_list=my_href.get_attribute("href")
            urls.append(url_list)

        print('Final Result: ')
        result = zip(positions,filtered, urls, company,price,saerched_url)
        final_results.clear()
        final_results.append(tuple(result))
        print(final_results)
        print("\n")


        print('global postion end :'+str(gobal_position))
        i=0
        try:
            for d in final_results:

                    while i <= gobal_position:
                      print( d[i])
                      cur.execute("""INSERT into staging.pla_crawler_results(position, product_name, url,company,price,searched_url) VALUES (%s, %s, %s,%s, %s,%s)""", d[i])
                      print('Inserted succesfully')
                      conn.commit()
                      i=i+1
        except (Exception, psycopg2.Error) as error:
                 print (error)
                 pass


    browser.close()
undetected Selenium
  • 183,867
  • 41
  • 278
  • 352
Sandeep
  • 671
  • 2
  • 7
  • 30

1 Answers1

-1

Ideally you shouldn't attempt to open 50 tabs at once as:


Solution

If you are having a List of the urls as follows:

['https://selenium.dev/downloads/', 'https://selenium.dev/documentation/en/']

You can iterate over the list to open them one by one in the adjacent tab for scraping using the following Locator Strategy:

  • Code Block:

    from selenium import webdriver
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.support import expected_conditions as EC
    from selenium.common.exceptions import TimeoutException, WebDriverException
    from selenium.webdriver.common.action_chains import ActionChains
    from selenium.webdriver.support.select import Select
    from selenium.webdriver.common.alert import Alert
    from selenium.webdriver.common.keys import Keys
    
    links = ['https://selenium.dev/downloads/', 'https://selenium.dev/documentation/en/']
    options = webdriver.ChromeOptions() 
    options.add_argument("start-maximized")
    options.add_experimental_option("excludeSwitches", ["enable-automation"])
    options.add_experimental_option('useAutomationExtension', False)
    for link in links:
        driver = webdriver.Chrome(options=options, executable_path=r'C:\Utility\BrowserDrivers\chromedriver.exe')
        driver.get(link)
        print(driver.title)
        print("Perform webscraping here")
        driver.quit()
    print("End of program")
    
  • Console Output:

    Downloads
    Perform webscraping here
    The Selenium Browser Automation Project :: Documentation for Selenium
    Perform webscraping here
    End of program
    

Reference

You can find a relevant detailed discussion in:

undetected Selenium
  • 183,867
  • 41
  • 278
  • 352
  • @Sandeep Checkout the answer update and let me know the status. – undetected Selenium Jan 24 '20 at 11:30
  • Many thanks for this effort :) I have checked this code,this code open links in tabs and process it and close and opens a new chrome and open with new link and process it till end, But this is not the results what we want. We need to open 50 tabs at once and that is fine ,The issue is that after opening those 50 tabs it only process the last active tab and after the end when i gave "browser.close()" it is getting "NoSuchWindowException". – Sandeep Jan 24 '20 at 11:44
  • So i need a solution to resolve that, close the last active window.Move control to next process it and close it and move to next tab and do the same..Any suggestions on what i need to do to achieve it? – Sandeep Jan 24 '20 at 11:45
  • Try using multi-processing or multithreading. – Jawad Ahmad Khan Nov 27 '21 at 08:23
  • @DebanjanB He was asking for simultaneously controlling all the tabs, which can be done only using the muti threaded or multiprocessing approach, while the answer focuses on a single tab approach which is irrelevant to the question asked. – Jawad Ahmad Khan Nov 27 '21 at 08:44
  • @JawadAhmadKhan Did you go through the complete answer at least once? Didn't you find the rationale why I have suggested single tab against nearly 50 odd tabs? I think my suggestion was back by some solid reasoning. Your thoughts please... – undetected Selenium Nov 27 '21 at 08:51
  • I understand it, I believe it's better to let someone else answer it using the muti processing approach which solves the exact problem someone having instead of an answer which doesn't help at all. I am writing an answer shortly, it may help. – Jawad Ahmad Khan Nov 27 '21 at 08:55
  • @JawadAhmadKhan Isn't _muti processing_ just another approach similar to single tab iterating over multiple tabs? In the question I don't even see _muti processing_ was a mandatory requirement. – undetected Selenium Nov 27 '21 at 09:00
  • Yes but it works simultaneously without any need to focus on any tab and switching. – Jawad Ahmad Khan Nov 27 '21 at 09:04
  • So you mean to say _Selenium_ works even without focus on specific tabs? Strange enough !!! – undetected Selenium Nov 27 '21 at 09:06