16

I have a list of URLs in a .txt file that I would like to run using selenium.

Lets say that the file name is b.txt in it contains 2 urls (precisely formatted as below): https://www.google.com/,https://www.bing.com/,

What I am trying to do is to make selenium run both urls (from the .txt file), however it seems that every time the code reaches the "driver.get" line, the code fails.

url = open ('b.txt','r')
url_rpt = url.read().split(",")
options = Options()
options.add_argument('--headless')
options.add_argument('--disable-gpu')
driver = webdriver.Chrome(chrome_options=options)
for link in url_rpt:
   driver.get(link)
driver.quit()

The result that I get when I run the code is

Traceback (most recent call last):
File "C:/Users/ASUS/PycharmProjects/XXXX/Test.py", line 22, in <module>
driver.get(link)
File "C:\Users\ASUS\AppData\Local\Programs\Python\Python38\lib\site- 
packages\selenium\webdriver\remote\webdriver.py", line 333, in get
self.execute(Command.GET, {'url': url})
File "C:\Users\ASUS\AppData\Local\Programs\Python\Python38\lib\site- 
packages\selenium\webdriver\remote\webdriver.py", line 321, in execute
self.error_handler.check_response(response)
File "C:\Users\ASUS\AppData\Local\Programs\Python\Python38\lib\site- 
packages\selenium\webdriver\remote\errorhandler.py", line 242, in 
check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.InvalidArgumentException: Message: invalid 
argument
(Session info: headless chrome=79.0.3945.117)

Any suggestion on how to re-write the code?

undetected Selenium
  • 183,867
  • 41
  • 278
  • 352
Sakyamooni
  • 163
  • 1
  • 1
  • 6
  • 1
    What do you mean by "fails?" Are you getting an exception? If so, what is the message and stacktrace? We need this basic info. – Greg Burghardt Jan 15 '20 at 16:24
  • 1
    In the for loop above `driver.get(link)` add a line `print(link)`. – Jortega Jan 15 '20 at 16:24
  • When "the code fails" what do you mean? What is the error message? What happens if you just run `for url in url_rpt: print(url)`. This might not be an issue with Selenium, but possibly with the `url` input and reading strategy. It would help to narrow down whether or not Selenium is truly throwing the error, or if the issue is with the file. – CEH Jan 15 '20 at 16:25
  • I'll update this on the post. – Sakyamooni Jan 15 '20 at 16:40
  • @Christine: Thanks! If I runa `for url in url_rpt: print (ur)` it would return both links just fine. – Sakyamooni Jan 15 '20 at 16:48
  • @Sakyamooni What happens if you just run `driver.get("https://www.google.com")`? Same error? – CEH Jan 15 '20 at 17:35
  • What happens when you comment out the calls to `options.add_argument`? – Greg Burghardt Jan 15 '20 at 22:43

2 Answers2

16

This error message...

Traceback (most recent call last):
  .
    driver.get(link)
  .
    self.execute(Command.GET, {'url': url})
  .
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.InvalidArgumentException: Message: invalid argument
  (Session info: chrome=79.0.3945.117)

...implies that the url passed as an argument to get() was an argument was invalid.

I was able to reproduce the same Traceback when the text file containing the list of urls contains a space character after the seperator of the last url. Possibly a space character was present at the fag end of b.txt as https://www.google.com/,https://www.bing.com/,.


Debugging

An ideal debugging approach would be to print the url_rpt which would have revealed the space character as follows:

  • Code Block:

    url = open ('url_list.txt','r')
    url_rpt = url.read().split(",")
    print(url_rpt)
    
  • Console Output:

    ['https://www.google.com/', 'https://www.bing.com/', ' ']
    

Solution

If you remove the space character from the end your own code would execute just perfecto:

options = webdriver.ChromeOptions() 
options.add_argument("start-maximized")
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
driver = webdriver.Chrome(options=options, executable_path=r'C:\WebDrivers\chromedriver.exe')
url = open ('url_list.txt','r')
url_rpt = url.read().split(",")
print(url_rpt)
for link in url_rpt:
   driver.get(link)
driver.quit()
undetected Selenium
  • 183,867
  • 41
  • 278
  • 352
  • 2
    Realized that there is a comma at the end of the list! Thanks a lot for highlighting this!! – Sakyamooni Jan 20 '20 at 18:22
  • 2
    I encountered the same error when i forgot to start the url with `https://` – philomath Apr 21 '21 at 06:14
  • 1
    Same as @philomath I was getting that exception on driver.get() function and I solved it by using http:// as a prefix ( http:// localhost in my case) – Dan May 29 '21 at 01:29
  • I was adding a list using a multi line string inside a function, calling .splitlines() on it, and it was counting the indentation as a new array element with four spaces. Thank you! – Duarte Nov 18 '21 at 23:26
2

I also faced a similar issue, where Selenium errored out while opening the URL and printed below message:

selenium.common.exceptions.InvalidArgumentException: Message: invalid argument
  (Session info: MicrosoftEdge=91.0.852.0)

On closely looking, i found that my url string was in 'UTF-8' and contained a leading ZWNBSP character, because of which selenium was not able to accept the URL(I was reading list of urls from a file, which caused this). IMO, selenium should have reported the error better(saying URL argument was invalid).

To rectify the issue, i used below code to clean my URL:

url = url.encode('ascii', 'ignore').decode('unicode_escape')
Yogesh Kumar Gupta
  • 1,009
  • 1
  • 10
  • 10