My script basically lets user input a link, goes to that website, find a specific element from the website, and check if it matches the previous web scraping result of the same element.
If the result is same as last time, nothing happens.
If the result is different than last time, notify user.
The script does web scraping with that link once every 3 minutes automatically, without asking for user inputs again. So user only needs to input the link once and it will continue running using the same link.
It works fine mostly, it can run for hours without problem. But occasionally, very rarely, it would throw me an error:
Traceback (most recent call last):
File "C:\Users\User\Documents\TTC_Sniper\ttc_sniper6.py", line 283, in <module>
schedule.run_pending()
File "C:\Users\User\AppData\Local\Programs\Python\Python310\lib\site-packages\schedule\__init__.py", line 780, in run_pending
default_scheduler.run_pending()
File "C:\Users\User\AppData\Local\Programs\Python\Python310\lib\site-packages\schedule\__init__.py", line 100, in run_pending
self._run_job(job)
File "C:\Users\User\AppData\Local\Programs\Python\Python310\lib\site-packages\schedule\__init__.py", line 172, in _run_job
ret = job.run()
File "C:\Users\User\AppData\Local\Programs\Python\Python310\lib\site-packages\schedule\__init__.py", line 661, in run
ret = self.job_func()
File "C:\Users\User\Documents\TTC_Sniper\ttc_sniper6.py", line 238, in TTC_Sniper
Snipe2(prev_input2)
File "C:\Users\User\Documents\TTC_Sniper\ttc_sniper6.py", line 130, in Snipe2
Snipe1(page_url1, count+1)
File "C:\Users\User\Documents\TTC_Sniper\ttc_sniper6.py", line 62, in Snipe1
all_comments = OpenBrowser(page_url1)
File "C:\Users\User\Documents\TTC_Sniper\ttc_sniper6.py", line 30, in OpenBrowser
driver.get(page_url)
File "C:\Users\User\AppData\Local\Programs\Python\Python310\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 440, in get
self.execute(Command.GET, {'url': url})
File "C:\Users\User\AppData\Local\Programs\Python\Python310\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 428, in execute
self.error_handler.check_response(response)
File "C:\Users\User\AppData\Local\Programs\Python\Python310\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 243, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.InvalidArgumentException: Message: invalid argument
(Session info: headless chrome=105.0.5195.127)
Stacktrace:
Backtrace:
Ordinal0 [0x0085DF13+2219795]
Ordinal0 [0x007F2841+1779777]
Ordinal0 [0x00704100+803072]
Ordinal0 [0x006F760A+751114]
Ordinal0 [0x006F61A8+745896]
Ordinal0 [0x006F63FD+746493]
Ordinal0 [0x00705A8E+809614]
Ordinal0 [0x0075F87D+1177725]
Ordinal0 [0x0074E7FC+1107964]
Ordinal0 [0x0075F192+1175954]
Ordinal0 [0x0074E616+1107478]
Ordinal0 [0x00727F89+950153]
Ordinal0 [0x00728F56+954198]
GetHandleVerifier [0x00B52CB2+3040210]
GetHandleVerifier [0x00B42BB4+2974420]
GetHandleVerifier [0x008F6A0A+565546]
GetHandleVerifier [0x008F5680+560544]
Ordinal0 [0x007F9A5C+1808988]
Ordinal0 [0x007FE3A8+1827752]
Ordinal0 [0x007FE495+1827989]
Ordinal0 [0x008080A4+1867940]
BaseThreadInitThunk [0x772CFA29+25]
RtlGetAppContainerNamedObjectPath [0x777B7B5E+286]
RtlGetAppContainerNamedObjectPath [0x777B7B2E+238]
I don't know what it means.
I Googled a bit, and some people said selenium.common.exceptions.InvalidArgumentException: Message: invalid argument
is caused by issues within the url itself, such as forgetting to add "https:" or containing a space.
I'm pretty sure this isn't the problem because the code has been running for about an hour before this error shows up. Also once I run the code again using the exact same link again, it works again.
Like I said, it happens only occasionally, seemingly randomly.
I don't know what could be causing this problem and I would like to know how to prevent it/make it notify me in case this happens.
If it helps, this is the part of my code that does web scraping.
def OpenBrowser(page_url):
options = Options()
options.add_argument('--headless')
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()), options=options)
driver.get(page_url)
time.sleep(3)
html = driver.find_element(By.TAG_NAME, 'html')
elems1 = driver.find_elements("xpath", '/html/body/div[2]/table/tbody/tr[2]/td[2]/section/div')
all_comments = [elem1.text for elem1 in elems1]
return all_comments