2

I am scraping a page to get the URLs and then use them to scrape a bunch of info. I'd like to avoid copying and pasting all the time but I cannot find how to make get() to work with the object. The first part of my code works perfectly well but when I get to the part that tries to get the url I get the following error message:

Traceback (most recent call last):
  File "/Users/rcastong/Desktop/imgs/try-creating-object-url.py", line 61, in <module>
    driver4.get(urlworks2) 
  File "/Users/rcastong/Library/Python/3.9/lib/python/site-packages/selenium/webdriver/remote/webdriver.py", line 333, in get
    self.execute(Command.GET, {'url': url})
  File "/Users/rcastong/Library/Python/3.9/lib/python/site-packages/selenium/webdriver/remote/webdriver.py", line 321, in execute
    self.error_handler.check_response(response)
  File "/Users/rcastong/Library/Python/3.9/lib/python/site-packages/selenium/webdriver/remote/errorhandler.py", line 242, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.InvalidArgumentException: Message: invalid argument
  (Session info: chrome=98.0.4758.109)

Here is part of the code

  #this part works well    
    for number, item in enumerate(imgs2, 1):
            # print('---', number, '---')
        
            img_url = item.get_attribute("href")
            if not img_url:
                print("none")
            else:
                print('"'+img_url+'",')
        
  # the error happens on driver4.get(urlworks2)        
        for i in range(0,30):
            urlworks = img_url[i]
            urlworks2 = urlworks.encode('ascii', 'ignore').decode('unicode_escape')
            driver4 = webdriver.Chrome()
            driver4.get(urlworks2) 
            def check_exists_by_xpath(xpath):
                try:
                    WebDriverWait(driver3,55).until(EC.presence_of_all_elements_located((By.XPATH, xpath)))
                except TimeoutException:
                    return False
                return True
            
            imgsrc2 = WebDriverWait(driver3,55).until(EC.presence_of_all_elements_located((By.XPATH, "//p[@data-testid='artistName']/ancestor::a[contains(@class,'ChildrenLink')]")))                                                                                                                 
            for number, item in enumerate(imgsrc2, 1):
                # print('---', number, '---')
                artisturls = item.get_attribute("href")
                if not artisturls:
                    print("none")
                else:
                    print('"'+artisturls+'",')
undetected Selenium
  • 183,867
  • 41
  • 278
  • 352
  • Hi @Remi try looking at this post it seems to be the same error https://stackoverflow.com/questions/59755609/selenium-common-exceptions-invalidargumentexception-message-invalid-argument-e – Gedeon Mutshipayi Feb 26 '22 at 16:35
  • Thx @gedflod. I had seen this post but it is slightly different than my situation. – Remi Castonguay Feb 26 '22 at 17:07

1 Answers1

1

This error message...

Traceback (most recent call last):
  .
    driver4.get(urlworks2) 
  .
    self.execute(Command.GET, {'url': url})
  .
    self.error_handler.check_response(response)
  .
selenium.common.exceptions.InvalidArgumentException: Message: invalid argument
  (Session info: chrome=98.0.4758.109)

...implies that the url passed as an argument to get() was an argument was invalid.


Deep Dive

With in the first for loop item.get_attribute("href") returns a url string and img_url gets updated at every iteration. So practically img_url remains a string but not a list of url as you assumed. As a result, in the second for loop when you try to iterate over the elements of a string and pass them to get() you see the error InvalidArgumentException: Message: invalid argument.


Demonstartion

As an example the below line of code:

img_url = 'https://www.google.com/'
for i in range(0,5):
    urlworks = img_url[i]
    urlworks2 = urlworks.encode('ascii', 'ignore').decode('unicode_escape')
    print(urlworks2)

prints:

h
t
t
p
s

Solution

Declare a empty list img_url within the global scope and keep on appending the hrefs to the list, so you can iterate the list later.

img_url = []
for number, item in enumerate(imgs2, 1):
    img_url.append(item.get_attribute("href"))

Reference

You can find a couple of relevant detailed discussions in:

undetected Selenium
  • 183,867
  • 41
  • 278
  • 352
  • Thx @undetected Selenium. That makes sense to me and I was suspecting something like that. I will give this a try. – Remi Castonguay Feb 27 '22 at 15:27
  • Hi again, @undetected Selenium. You definitely put me on the right track so I marked your answer as accepted. The only thing I would add was slightly modify and insert with in the bigger loop: for i in range(0,30): img_url = [] for number, item in enumerate(imgs2, 1): imgwors2 = item.get_attribute("href") Not super familiar with stackoverflow's standards, should I post my entire modified code somewhere? – Remi Castonguay Feb 27 '22 at 16:43
  • @RemiCastonguay Feel free to raise a new question as per your new requirement. – undetected Selenium Feb 27 '22 at 16:48
  • Hi again! No, what I meant is, thx to you I figured out how to make this work. Should I post my successful code somewhere? – Remi Castonguay Feb 27 '22 at 17:01
  • @RemiCastonguay If the working (your) answer is entirely different from this answer, you should consider posting a new answer for the benefit of the future readers. – undetected Selenium Feb 27 '22 at 17:03