0

I want to download the full html of a webpage, I have written some code to do this. However when I go back and look at the html downloaded I see that only about half of the html is there. I think this is because the webpage is dynamic and loads more information as you interact with the page. I have been trying to use PhantomJS to do this in unison with ChromeDriver Manager but no luck. This is the code that only downloads some of the html(I believe again because the page is dynamic):

from bs4 import BeautifulSoup
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
import os
import re
import time

driver = webdriver.Chrome(ChromeDriverManager().install(), options=chrome_options)
driver.get(''https://medium.com/@benjaminhardy')
time.sleep(25)
html = driver.page_source
driver.close()

This is my attempt with PhantomJS but no luck:

driver = webdriver.Chrome(ChromeDriverManager().install().PhantomJS())
driver.get('https://medium.com/@benjaminhardy')
html = driver.page_source
time.sleep(25)
driver.close()

error:'str' object has no attribute 'PhantomJS'
Nimantha
  • 6,405
  • 6
  • 28
  • 69
  • By 'no luck', you mean that PhantomJS driver makes no apparent difference? – DaveIdito Aug 01 '20 at 15:23
  • yes @Daveldito, I was under the impression PhantomJS would render the full html before the download, but I am not to familiar with it. – AndrewLittle1 Aug 01 '20 at 15:26
  • Sadly I have never used PhantonJS, but this might help you: https://stackoverflow.com/questions/28928068/scroll-down-to-bottom-of-infinite-page-with-phantomjs-in-python I don't see anywhere in your code where you actually scroll and load until the end of the page (which, as I said, I myself don't know how to do with Phantomjs) – DaveIdito Aug 01 '20 at 15:38

0 Answers0