1

First off,let me start by stating that I have researched this matter thoroughly and I am struggling to find the reasoning behind this error.

I am running Selenium in Google App Engine, in a Flask service & I always get an "invalid session id" error. Here is how I initialize the driver :

import chromedriver_binary

# User agent for desktop
user_agent = 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.5112.81 Safari/537.36'

options = Options()

# specify headless mode
options.add_argument('--headless')

options.add_argument(f'user-agent={user_agent}')

options.add_argument("--window-size=1920,1080")

options.add_argument("--disable-gpu")
options.add_argument("--no-sandbox")
options.add_argument('--disable-blink-features=AutomationControlled')
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)

options.add_argument('--disable-dev-shm-usage')     

wd = webdriver.Chrome(options=options)

And then this is the snippet that is problematic:

@app.route("/scrape-stuff")

def scrape_stuff():
    
    try:
        wd.get(".....")
        time.sleep(3)
    except Exception as e:
        send_sms("Critical error")
        sys.stdout.write("Critical error" + str(e))

    # this array holds the result of the scrape
    result_array=[]
    # accept cookies button
    try:
        wd.execute_script('...').click())
    except:
        pass



    for i in range(1,2):

  
        
    #     Select from the dropdown 
        try:
            wd.execute_script(f"document.querySelector('....').click()")
            time.sleep(2)
        except Exception as e:
            send_sms("Critical error: cannot select from dropdown"+str(e))
            sys.stdout.write("ERROR")
        
        
    #     Press all Load more -until there is nothing left 
        view_more_button=1
        while view_more_button == 1:
            try:
                wd.execute_script('document.querySelector("....").click()')
                time.sleep(5)
            except :
                view_more_button = 0
                pass
        
    #     Get all the items 
        elements= wd.find_elements('xpath',"//div[@class='.....']") 


    #    wd.save_screenshot("scraper.png")
    #    return send_file("scraper.png")   <----  SHOWS EXPECTED OUTPUT (also ,I am not blocked or asked for Captcha)

        
       for element in elements :
         sub_element=element.find_element('xpath','//div[@class='....']')  <----- ALWAYS THROWS ERROR HERE
 
    return "OK"

Could you please let me know what I'm doing wrong? I've been trying to solve this for days literally! Thank you !

P.S. Works as expected in Cloud Run/Jupyter

EDIT: This is the stack trace :

"Traceback (most recent call last):
File "/usr/local/lib/python3.8/site-packages/flask/app.py", line 2073, in wsgi_app
response = self.full_dispatch_request()
File "/usr/local/lib/python3.8/site-packages/flask/app.py", line 1518, in full_dispatch_request
rv = self.handle_user_exception(e)
File "/usr/local/lib/python3.8/site-packages/flask/app.py", line 1516, in full_dispatch_request
rv = self.dispatch_request()
File "/usr/local/lib/python3.8/site-packages/flask/app.py", line 1502, in dispatch_request
return self.ensure_sync(self.view_functions[rule.endpoint])(**req.view_args)
File "/app/main.py", line 360, in scrape_stuff
sub_element = elements[element_index].find_element('xpath',".//span[@class='....']").get_attribute('innerHTML')
File "/usr/local/lib/python3.8/site-packages/selenium/webdriver/remote/webdriver.py", line 634, in execute_script
return self.execute(command, {
File "/usr/local/lib/python3.8/site-packages/selenium/webdriver/remote/webdriver.py", line 321, in execute
self.error_handler.check_response(response)
File "/usr/local/lib/python3.8/site-packages/selenium/webdriver/remote/errorhandler.py", line 242, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.InvalidSessionIdException: Message: invalid session id"

Additional edit : I have printed the webriver session id Before and In the loop , and everything shows the expected behaviour :

2022-08-24 11:46:32.000 CEST
SESSION ID BEFORE LOOP d361399a553f0ce677665c0326156c23
2022-08-24 11:46:32.000 CEST
WEBDRIVER OBJECT <selenium.webdriver.chrome.webdriver.WebDriver (session="d361399a553f0ce677665c0326156c23")>
2022-08-24 11:46:32.000 CEST
SESSION ID IN THE LOOP d361399a553f0ce677665c0326156c23
Peter Malik
  • 403
  • 4
  • 14
  • Is this GAE Flex? – NoCommandLine Aug 23 '22 at 21:53
  • @NoCommandLine Yes , it is ! – Peter Malik Aug 23 '22 at 21:53
  • can you include the full Selenium exception in your post? (It should show things like browser/driver version and a more detailed exception). I have a feeling this is caused by some command timeouts... it can sometimes cause Selenium to remove/kill the session because it thinks the browser has crashed. (It's not receiving responses from it in time...) You may want to add more try/catches before that line to see the exception that starts the timeouts to build up. (also don't use disable-blink-features unless you have a reason to...) – pcalkins Aug 23 '22 at 23:36
  • @pcalkins well it's only `Message: invalid session id` when I `print(Exception)`, but I have edited the post to contain the stack trace, hope it helps. Also , other log entry showed the version , 104.0.5112.79.0. If this information doesn't help could you please let me know what else you need? I added a try/except right before the loop and everything works as expected - only when I get in the loop the behavior is present. Thanks ! – Peter Malik Aug 24 '22 at 00:33
  • I can only guess at some possible solutions here. One is to replace your execute script call with a proper Selenium click() method. (using a webdriverwait with expected condition of to be clickable would be good too) If there's some reason you can't do that, assign a return value there. So string retval=wd.execute_script('document.querySelector("....").click()') and check that response for a clue. You should probably also try running your code without setting any options to see if any of those are causing an issue. Especially disable-blink... – pcalkins Aug 24 '22 at 16:42
  • @pcalkins Thanks for the reply! Unfortunately, the problem is neither in the wd.execute_script itself nor with the disabled option. That is because I have tested it with and without the option locally -and it works. The wd.execute_script line itself is not problematic, because as soon as the loop is entered the error pops up - no matter which line makes reference first to the webdriver (have found this by implementing lots of try/except). Do you maybe have other suggestions as well ? – Peter Malik Aug 24 '22 at 17:24
  • You should still monitor the results of that. Selenium can kill the session if it's waiting for previous returns (or "promises") from the browser and those timeout.... but before the timeout another command times out. Doesn't seem like this should happen, but it's the cause of some of these types of issues. Another cause can be lack of disk space, but I would think the browser would actually crash in that case. You'll see that if you turn headless off. (btw, you may also want to add the version of Selenium you are using... it might help others to repeat the behavior) – pcalkins Aug 24 '22 at 17:29
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/247526/discussion-between-peter-malik-and-pcalkins). – Peter Malik Aug 24 '22 at 17:35
  • Based on this [post](https://stackoverflow.com/questions/28564668/how-do-i-run-selenium-tests-with-google-app-engine), Selenium WebDriver is a framework that could configure and control the browsers on the OS level and, on an App Engine instance, you don't have a browser and this service can not run a web clients like browsers. – Sarah Remo Aug 25 '22 at 00:41
  • However, I suggest that you use Custom Runtime to work this inside a Docker Container. You can refer on these links [docker-selenium](https://github.com/SeleniumHQ/docker-selenium#running-the-images) and [Python Headless Browser for GAE](https://stackoverflow.com/questions/14384062/python-headless-browser-for-gae). – Sarah Remo Aug 25 '22 at 00:42
  • @SarahRemo Yes, Im already using a Dockerfile & a flex GAE env. I do everything as you outlined :)) – Peter Malik Aug 25 '22 at 00:43
  • Have you seen these two [post](https://stackoverflow.com/questions/56483403/selenium-common-exceptions-webdriverexception-message-invalid-session-id-using) and [post](https://stackoverflow.com/questions/70764346/python-selenium-error-invalid-session-id)? – Sarah Remo Aug 25 '22 at 02:40
  • @SarahRemo Yes , of course . I never close/quit the driver – Peter Malik Aug 25 '22 at 10:49

0 Answers0