3

I am trying to create a generic webcrawler that will go to a site and take a screenshot. I am using Python, Selnium, and PhantomJS. The problem is that the screenshot is not capturing all the images on a page. For example, if I go to you tube, it doesn't capture images below the main page image. (I don't have high enough rep to post screen shot) I think this may have something to do with dynamic content, but I have tried the wait functions such as implicitly wait and on set_page_load_timeout methods. Because this is a generic crawler I can't wait for a specific event (I want to crawl hundreds of sites).

Is it possible to create a generic webcrawler that can do the screen capture I am trying to do? Code I am using is:

phantom = webdriver.PhantomJS()
phantom.set_page_load_timeout(30)
phantom.get(response.url)
img = phantom.get_screenshot_as_png() #64-bit encoded string
phantom.quit

Here is the image

Nagaraj Tantri
  • 5,172
  • 12
  • 54
  • 78
Malcolm
  • 99
  • 1
  • 5
  • The solution is probably increasing the viewportSize, then scrolling down the page and finally waiting a bit. – Artjom B. Oct 06 '14 at 07:08
  • see here: http://stackoverflow.com/questions/37906704/taking-a-whole-page-screenshot-with-selenium-marionette-in-python/42531572 – Martin Krung Mar 01 '17 at 12:22

1 Answers1

4

Your suggestion solved the problem. Used the following code (stolen in part from answer to another question):

driver = webdriver.PhantomJS()    
driver.maximize_window()
driver.get('http://youtube.com')  
scheight = .1
while scheight < 9.9:
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight/%s);" % scheight)
    scheight += .01        
driver.save_screenshot('screenshot.png')
Malcolm
  • 99
  • 1
  • 5
  • 3
    Well, code-only answers aren't *very* helpful for others. Please explain ***why** you're doing this and **how** does this work*. – Remi Guan Feb 08 '16 at 16:27
  • some working code here: http://stackoverflow.com/questions/37906704/taking-a-whole-page-screenshot-with-selenium-marionette-in-python/42531572 – Martin Krung Mar 01 '17 at 12:22
  • See this post https://stackoverflow.com/a/57338909/2943191 for more information if you run into problems. – Klaidonis Aug 03 '19 at 13:55