1

I m using selenium in Python to grab web page's text including those dependent from javascript, but its using firefox or other browser, opens a window and processes very slow like 30 seconds per one page. Can i speedup it somehow?

example of the code is :

    gecko_path = r'X:\Programming\geckodriver\geckodriver.exe'

binary = r'C:\Program Files\Mozilla Firefox\firefox.exe'
options = Options()
options.binary = binary

xml_id ="JobDescription"
xml_class ="details-content"

driver = webdriver.Firefox(firefox_options=options, executable_path = gecko_path)

# get web page
driver.get(url)

text = bytes(driver.find_element_by_class_name(xml_class).text.encode('utf-8'))

print(type(text))
user8426627
  • 903
  • 1
  • 9
  • 19
  • 1
    Did you checked if it's taking the same time when you use `--head-less` in chrome? – supputuri Jun 03 '19 at 19:59
  • dkn mozilla goes pretty slow headless , and with binary_chrome = r'C:\Program Files (x86)\Google\Chrome\Application\chrome.exe' options = Options() options.binary = binary_chrome options.headless = True driver = webdriver.Chrome(chrome_options=options, executable_path = gecko_path)driver.get(url) Chrome dont gets the url, whats wrong there pls? – user8426627 Jun 03 '19 at 20:11
  • `options = ChromeOptions(); options.add_argument("--headless"); driver = webdriver.Chrome( executable_path=binarypath"), chrome_options=options)` – supputuri Jun 03 '19 at 20:37
  • Try that. It's ChromeOptions(). – supputuri Jun 03 '19 at 20:37
  • HTMLUnit is much faster, though the javascript support is not great. (but getting better) – pcalkins Jun 03 '19 at 20:38
  • still it does not load the page, it just opens chrome window with google page – user8426627 Jun 03 '19 at 20:46
  • Which website are you looking after? – KunduK Jun 03 '19 at 20:50
  • url = r'https://www.monster.de/jobs/q-administration-jobs.aspx?intcid=swoop_BrowseJobs_Administration' binary_chrome = r'C:\Program Files (x86)\Google\Chrome\Application\chrome.exe' #options.add_argument('headless') #driver = webdriver.Chrome(chrome_options=options, executable_path = gecko_path) options = ChromeOptions(); options.add_argument("--headless"); options.binary_location = binary_chrome driver = webdriver.Chrome( executable_path=binary_chrome, chrome_options=options) – user8426627 Jun 03 '19 at 20:58
  • Are you using any time.sleep or implicit waits in your code? These can both slow down execution time. – Ardesco Jun 04 '19 at 07:38
  • have you verified that the content you think is only availably by letting js run on the page is truly only available that way? Can you supply the url and indication of required data? – QHarr Jun 04 '19 at 08:30
  • this is the url above https://www.monster.de/jobs/q-administration-jobs.aspx?intcid=swoop_BrowseJobs_Administration, the text inside is only visible after executing js – user8426627 Jun 04 '19 at 09:11

1 Answers1

2

I don't think you can efficiently speed up Selenium tests as they kick off a real browser therefore you should get more or less the same timings as if you open the page using the normal browser.

You can consider headless tools which are designed for web scraping like Scrapy or beautifulsoup - this way you should be able to get the interesting text from the pages much faster.

Another option is kicking off several browser instances using Selenium Grid and run your Selenium tests in parallel, it will allow you to proportionally reduce execution time by the number of browsers you will be able to kick off on your hardware

Dmitri T
  • 159,985
  • 5
  • 83
  • 133