14

I've been struggling with this problem for sometime, but now I'm coming back around to it. I'm attempting to use selenium to scrape data from a URL behind a company proxy using a pac file. I'm using Chromedriver, which my browser uses the pac file in it's configuration.

I've been trying to use desired_capabilities, but the documentation is horrible or I'm not grasping something. Originally, I was attempting to webscrape with beautifulsoup, which I had working except the data I need now is in javascript, which can't be read with bs4.

Below is my code:

import pandas as pd
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.proxy import Proxy, ProxyType
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

desired_capabilities = webdriver.DesiredCapabilities.CHROME.copy()

PAC_PROXY = {
    'proxyAutoconfigUrl': 'http://proxy-pac/proxy.pac',
}
proxy = Proxy()
proxy.proxy_autoconfig_url = PAC_PROXY['proxyAutoconfigUrl']

desired_capabilities = {}
proxy.add_to_capabilities(desired_capabilities)
URL = "https://mor.nlm.nih.gov/RxClass/search?query=ALIMENTARY%20TRACT%20AND%20METABOLISM%7CATC1-4&searchBy=class&sourceIds=a&drugSources=atc1-4%7Catc%2Cepc%7Cdailymed%2Cmeshpa%7Cmesh%2Cdisease%7Cmedrt%2Cchem%7Cdailymed%2Cmoa%7Cdailymed%2Cpe%7Cdailymed%2Cpk%7Cmedrt%2Ctc%7Cfmtsme%2Cva%7Cva%2Cdispos%7Csnomedct%2Cstruct%7Csnomedct%2Cschedule%7Crxnorm"

service = Service('C:\Program Files\Chrome Driver\chromedriver.exe')
driver = webdriver.Chrome(service=service)
driver.get(URL)
print(driver.requests[0].headers, driver.requests[0].response)

WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.CSS_SELECTOR, 'tr.dbsearch')))
print(pd.read_html(driver.page_source)[1].iloc[:,:-1])
pd.read_html(driver.page_source)[1].iloc[:,:-1].to_csv('table.csv',index=False)

I'm not sure why I'm receiving an:

TypeError: __init__() got an unexpected keyword argument 'service'

even when I have the path added correctly to my system environment variables as shown below:

enter image description here

Essentially what I'm attempting to do is scrape the data in the table from https://mor.nlm.nih.gov/RxClass/search?query=ALIMENTARY%20TRACT%20AND%20METABOLISM%7CATC1-4&searchBy=class&sourceIds=a&drugSources=atc1-4%7Catc%2Cepc%7Cdailymed%2Cmeshpa%7Cmesh%2Cdisease%7Cmedrt%2Cchem%7Cdailymed%2Cmoa%7Cdailymed%2Cpe%7Cdailymed%2Cpk%7Cmedrt%2Ctc%7Cfmtsme%2Cva%7Cva%2Cdispos%7Csnomedct%2Cstruct%7Csnomedct%2Cschedule%7Crxnorm then store it to a pandas dataframe and pass it to a csv file.

undetected Selenium
  • 183,867
  • 41
  • 278
  • 352
user1470034
  • 671
  • 2
  • 8
  • 23

5 Answers5

18

If you are still using Selenium v3.x then you shouldn't use the Service() and in that case the key executable_path is relevant. In that case the lines of code will be:

driver = webdriver.Chrome(executable_path='C:\Program Files\Chrome Driver\chromedriver.exe')

Else, if you are using then you have to use Service() and in that case the key executable_path is no more relevant. So you need to change the line of code:

service = Service(executable_path='C:\Program Files\Chrome Driver\chromedriver.exe')
driver = webdriver.Chrome(service=service)

as:

service = Service('C:\Program Files\Chrome Driver\chromedriver.exe')
undetected Selenium
  • 183,867
  • 41
  • 278
  • 352
  • 1
    Hi Debanjan, thank you for the reply. I am using Selenium 4.10. I've updated the line to service = Service('C:\Program Files\Chrome Driver\chromedriver.exe'), but i'm still getting TypeError: __init__() got an unexpected keyword argument 'service'. The code above has also been updated to show the change. – user1470034 Dec 31 '21 at 05:36
  • Cross check once if you are still on _Selenium 3.x_ – undetected Selenium Dec 31 '21 at 20:51
  • I never had 3.x, only 4.10. What am I missing? – user1470034 Dec 31 '21 at 20:57
  • @user1470034 We are still at **v4.1.0**, where did you find _v4.10_? – undetected Selenium Dec 31 '21 at 20:58
  • Missed the . on that reply – user1470034 Jan 01 '22 at 21:07
  • You were correct, even though the python library installed was 4.1.0 it was still reading as Selenium v3.x. Not, sure how that happened, but using webdriver.Chrome(executable_path='C:\Program Files\Chrome Driver\chromedriver.exe') worked. – user1470034 Jan 03 '22 at 17:33
5

I've been having this problem also since I switched from pip Jupyter to Anaconda Jupyter. This worked for me:

driver = webdriver.Chrome(ChromeDriverManager().install())

Apparently you don't need the service in the Jupyter package.

tshirtdr1
  • 69
  • 1
  • 3
3

due to changes in selenium 4.10.0: https://github.com/SeleniumHQ/selenium/commit/9f5801c82fb3be3d5850707c46c3f8176e3ccd8e

the first argument is no longer executable_path, but options. (ChromeDriverManager().install() ) returns the path to the install location.) Since selenium manager is now included with selenium 4.10.0, you should no longer use ChromeDriverManager at all.

from selenium import webdriver
driver = webdriver.Chrome()
#.......
driver.quit()

However, if you still want to pass in the executable_path to an existing driver, you must use the service arg now:

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
service = Service(executable_path="PATH_TO_DRIVER")
options = webdriver.ChromeOptions()
driver = webdriver.Chrome(service=service, options=options)
#......
[![enter image description here][1]][1]driver.quit()

I got the same error and it worked for me in jupyter and VS code too

amaar
  • 97
  • 8
2

The most common reason for seeing: "TypeError: __init__() got an unexpected keyword argument" today is due to changes in selenium 4.10.0, where most args were moved: https://github.com/SeleniumHQ/selenium/commit/9f5801c82fb3be3d5850707c46c3f8176e3ccd8e:

Changes_in_selenium_4_10_0

Now, if you want to call Chrome() (or another browser), you'll need to pass args into the options and service args. Here's an example of passing in data in the new format:

from selenium import webdriver
from selenium.webdriver.chrome.service import Service

service = Service(executable_path="/usr/local/bin/chromedriver")
options = webdriver.ChromeOptions()
options.add_argument('--headless=new')
options.set_capability("cloud:options", {"name": "test_1"})
driver = webdriver.Chrome(service=service, options=options)
# ...
driver.quit()

(In that example, note that passing in an executable_path is optional, as the latest version of selenium now includes a manager that automatically downloads the driver if not found on your PATH.)

Michael Mintz
  • 9,007
  • 6
  • 31
  • 48
1

I was able to solve be uninstalling and re-installing selenium

pip uninstall selenium
pip install selenium

And then using this code:

from selenium import webdriver
from selenium.webdriver.chrome.service import Service

service = Service(ChromeDriverManager().install())
driver = webdriver.Chrome(service=service)
Haddock-san
  • 745
  • 1
  • 12
  • 25