0

My script basically lets user input a link, goes to that website, find a specific element from the website, and check if it matches the previous web scraping result of the same element.

If the result is same as last time, nothing happens.

If the result is different than last time, notify user.

The script does web scraping with that link once every 3 minutes automatically, without asking for user inputs again. So user only needs to input the link once and it will continue running using the same link.

It works fine mostly, it can run for hours without problem. But occasionally, very rarely, it would throw me an error:

Traceback (most recent call last):
  File "C:\Users\User\Documents\TTC_Sniper\ttc_sniper6.py", line 283, in <module>
    schedule.run_pending()
  File "C:\Users\User\AppData\Local\Programs\Python\Python310\lib\site-packages\schedule\__init__.py", line 780, in run_pending
    default_scheduler.run_pending()
  File "C:\Users\User\AppData\Local\Programs\Python\Python310\lib\site-packages\schedule\__init__.py", line 100, in run_pending
    self._run_job(job)
  File "C:\Users\User\AppData\Local\Programs\Python\Python310\lib\site-packages\schedule\__init__.py", line 172, in _run_job
    ret = job.run()
  File "C:\Users\User\AppData\Local\Programs\Python\Python310\lib\site-packages\schedule\__init__.py", line 661, in run
    ret = self.job_func()
  File "C:\Users\User\Documents\TTC_Sniper\ttc_sniper6.py", line 238, in TTC_Sniper
    Snipe2(prev_input2)
  File "C:\Users\User\Documents\TTC_Sniper\ttc_sniper6.py", line 130, in Snipe2
    Snipe1(page_url1, count+1)
  File "C:\Users\User\Documents\TTC_Sniper\ttc_sniper6.py", line 62, in Snipe1
    all_comments = OpenBrowser(page_url1)
  File "C:\Users\User\Documents\TTC_Sniper\ttc_sniper6.py", line 30, in OpenBrowser
    driver.get(page_url)
  File "C:\Users\User\AppData\Local\Programs\Python\Python310\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 440, in get
    self.execute(Command.GET, {'url': url})
  File "C:\Users\User\AppData\Local\Programs\Python\Python310\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 428, in execute
    self.error_handler.check_response(response)
  File "C:\Users\User\AppData\Local\Programs\Python\Python310\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 243, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.InvalidArgumentException: Message: invalid argument
  (Session info: headless chrome=105.0.5195.127)
Stacktrace:
Backtrace:
        Ordinal0 [0x0085DF13+2219795]
        Ordinal0 [0x007F2841+1779777]
        Ordinal0 [0x00704100+803072]
        Ordinal0 [0x006F760A+751114]
        Ordinal0 [0x006F61A8+745896]
        Ordinal0 [0x006F63FD+746493]
        Ordinal0 [0x00705A8E+809614]
        Ordinal0 [0x0075F87D+1177725]
        Ordinal0 [0x0074E7FC+1107964]
        Ordinal0 [0x0075F192+1175954]
        Ordinal0 [0x0074E616+1107478]
        Ordinal0 [0x00727F89+950153]
        Ordinal0 [0x00728F56+954198]
        GetHandleVerifier [0x00B52CB2+3040210]
        GetHandleVerifier [0x00B42BB4+2974420]
        GetHandleVerifier [0x008F6A0A+565546]
        GetHandleVerifier [0x008F5680+560544]
        Ordinal0 [0x007F9A5C+1808988]
        Ordinal0 [0x007FE3A8+1827752]
        Ordinal0 [0x007FE495+1827989]
        Ordinal0 [0x008080A4+1867940]
        BaseThreadInitThunk [0x772CFA29+25]
        RtlGetAppContainerNamedObjectPath [0x777B7B5E+286]
        RtlGetAppContainerNamedObjectPath [0x777B7B2E+238]

I don't know what it means.

I Googled a bit, and some people said selenium.common.exceptions.InvalidArgumentException: Message: invalid argument is caused by issues within the url itself, such as forgetting to add "https:" or containing a space.

I'm pretty sure this isn't the problem because the code has been running for about an hour before this error shows up. Also once I run the code again using the exact same link again, it works again.

Like I said, it happens only occasionally, seemingly randomly.

I don't know what could be causing this problem and I would like to know how to prevent it/make it notify me in case this happens.

If it helps, this is the part of my code that does web scraping.

def OpenBrowser(page_url):
    options = Options()
    options.add_argument('--headless')

    driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()), options=options)

    driver.get(page_url)
    time.sleep(3)

    html = driver.find_element(By.TAG_NAME, 'html')

    elems1 = driver.find_elements("xpath", '/html/body/div[2]/table/tbody/tr[2]/td[2]/section/div')

    all_comments = [elem1.text for elem1 in elems1]
    return all_comments
  • 1
    if page uses JavaScript to add elements then sometime it may need longer time to do it. And you should rather use [waits](https://www.selenium.dev/documentation/webdriver/waits/) for this. OR you should catch error in `try/except` and repeate `find_elements` afere small time. – furas Oct 11 '22 at 13:48
  • 1
    if you run it in loop then maybe you shodul create `driver` only once - and reuse it. – furas Oct 11 '22 at 13:49
  • 1
    first you could use `print(page_url)` to check if you really use correct url – furas Oct 11 '22 at 13:51
  • Thank you! I didn't think of using print(page_url), I should do just that. Also I just looked up what you said about waits. Do you think the answers in https://stackoverflow.com/questions/22741591/python-selenium-webdriver-try-except-loop would work for me as well? It's from 8 years ago. – RonaLightfoot Oct 11 '22 at 14:44
  • 1
    it should work - but using `while` it may sometimes wait forever :) You could use `for x in range(10):` to check it only 10 times and resign – furas Oct 11 '22 at 15:29

0 Answers0