3

I am downloading multiple reports from a website. Each report has its own URL. Some URLs are loading fine without a problem.

But there is one URL that produces the following error. All URLs have the same pattern except for query parameter. If I open the URL on my browser it works fine.

I am using Selenium and the Firefox Driver. enter image description here

Below is my code

if __name__ == "__main__":
firefox_options = Options()
#firefox_options.headless = True
driver = webdriver.Firefox(options=firefox_options, executable_path=firefox_driver_location)



logged_in = "no"
for query in sa360_query_array:
    print("query being processed is " + query )
    if  "270348" in query or "269756" in query:
        wait_time = 300
    else:
        wait_time = 15
    driver.get(query)
    print("Page wait time is " + str(wait_time))
    driver.implicitly_wait(300000)
    #print(driver.page_source)
    if logged_in !="yes":
        google_login(query,email_login,email_password) #This function logs into google account
        print("Sleeping 200 seconds")
        time.sleep(200)

    logged_in = "yes"

    #time.sleep(200)
    print("reading HTML")
    #print(driver.page_source)
    read_web = pd.read_html(driver.page_source)#The error occurs on this line

What is that error referring too?

undetected Selenium
  • 183,867
  • 41
  • 278
  • 352
Quinnystar27
  • 322
  • 1
  • 4
  • 14

1 Answers1

2

This error message...

WebDriverException: Message: [Exception...  "Failure"  nsresult: "0x80004005 (NS_ERROR_FAILURE)" location: "JS frame :: chrome://marionette/content/proxy.js :: sendReply_ :: line 276"  data: no]

...implies that the Marionette threw an error while attempting to read/store/copy the page_source().

The relevant HTML DOM / DOM Tree would have helped us to debug the issue in a better way. However it seems the issue is with the fact that the page_source() is emencely huge/large which exceeds the max value of the max value Marionette can handle. Possibly it's a much bigger string you're dealing with.


Solution

A quick solution will be to avoid passing the page_source() to the variable and print it to find out where the actual issue lies.

print(driver.page_source)

Another aspect to look after would be pd.read_html() which I am quite not sure.


Reference

You can find a couple of relevant discussion in:


Outro

Documentation links:

undetected Selenium
  • 183,867
  • 41
  • 278
  • 352
  • It is a very long page compared to the others. Let me try printing out the page source and see what I can discover – Quinnystar27 Aug 26 '19 at 08:52
  • @Quinnystar27 _...It is a very long page compared to the others..._ is exactly what I emphasized as _**Possibly it's a much bigger string you're dealing with**_ within my answer. – undetected Selenium Aug 26 '19 at 08:54