3

I have recently completed my Selenium Python scraper. It works totally fine when I run it on my personal machine but the results are not same when I run it on the server. On the server I am running headless using pyvirtualdisplay

browser.get('https://example.com')  
html = browser.page_source

And this is my code for pyvirtualdisplay.

display = Display(visible=0, size=(800, 600))
display.start()

While running on local machine, it totally grabs the HTML which is generated by JavaScript but when I run it on my server it doesn't grabs the HTML content generated by JavaScript, so I end up with only a partial page with none of the JS generated content.

Update: I have also took screenshots as per suggestions using Selenium. Screenshot shows that the page is partially loaded and the content by JS isn't loaded onto the screen.

abhanan93
  • 251
  • 1
  • 2
  • 11
  • How is `browser` defined? Thanks. – alecxe Jun 15 '16 at 14:02
  • @alecxe `browser = webdriver.Firefox()` – abhanan93 Jun 15 '16 at 14:36
  • On the server, is it running on an actual GUI or in a frame buffer (headless)? – Mo H. Jun 15 '16 at 15:19
  • 1
    I've run into issues like this before, it can be a number of things. What are you using for your frame buffer? Can you also include the command you are using to run your tests on the server? – Mo H. Jun 15 '16 at 15:29
  • Also can you include an example of what you expect to get vs what you actually get? – Mo H. Jun 15 '16 at 15:35
  • I use `python script.py`. The same I use on my local machine. I've implemented the scraper in Flask framework. And I am using pyvirtualdisplay for my frame buffer. – abhanan93 Jun 15 '16 at 15:36
  • I expect to get a fully loaded page. Some of the site HTML is generated using JavaScript which I need to get using `browser.page_source`. – abhanan93 Jun 15 '16 at 15:37
  • I understand that, but can you give us an example of what you expect and what you get? Are you getting a partial page or no page at all? and for pythonvirtualdisplay what is the framebuffer? – Mo H. Jun 15 '16 at 15:38
  • I am getting a partial page. The content which is loaded by the JavaScript is not present in the HTML I get through `browser.page_source`. On the other hand, when I am running the script in local machine, I am getting a full page. – abhanan93 Jun 15 '16 at 15:40
  • Can you show us your code for pythonvirtualdisplay as well? – Mo H. Jun 15 '16 at 15:46
  • Added in the post. – abhanan93 Jun 15 '16 at 15:55
  • Ok last question (I hope). What browser are you using, are they same version on both machines? – Mo H. Jun 15 '16 at 16:01
  • I am using Firefox. And yes, they are the same versions. Firefox 44.0.2 to be precise. – abhanan93 Jun 15 '16 at 16:02
  • And same type of OS? Try taking a screenshot and seeing if the page even renders right – Mo H. Jun 15 '16 at 16:04
  • My local machine has Ubuntu 14.04 and the server has CentOS 7. – abhanan93 Jun 15 '16 at 16:07

1 Answers1

1

This sounds like an issue with your OS or Browser configuration. The first thing you should do is screenshot the results in your framebuffer and make sure that firefox is loading JS content properly. If it is not then you may need to check your browser/OS configurations.

pyvirtualdisplay has a way of screenshot that you can look at here

specflow can also screenshot, instructions here

If it is a FireFox/Browser issue, be sure that you have installed all the proper plugins and Java needed to run javascript on your server. Make sure that javascript is enabled for that browser.

Community
  • 1
  • 1
Mo H.
  • 1,788
  • 1
  • 18
  • 28
  • I have a Ubuntu server too. For the first time I tested the script on that server, it worked flawlessly. But when I ran the script again it didn't work with the same problem like on CentOS server. And on CentOS server, it didn't work even once. Thanks for recommending the screenshots. I am looking into them and will notify you. – abhanan93 Jun 15 '16 at 16:16
  • So as you suggested, I have also took screenshots. Screenshots show that the page is partially loaded and the content by JS isn't loaded. – abhanan93 Jun 15 '16 at 17:30
  • @abhanan93 So the issue isn't selenium then. Its your OS, your framebuffer, or your browser – Mo H. Jun 15 '16 at 17:34
  • What do you suggest then? – abhanan93 Jun 15 '16 at 17:39
  • Updated answer to reflect suggestions – Mo H. Jun 15 '16 at 17:39