I am trying to download thousands of HTML pages in order to parse them. I tried it with selenium but the downloaded file does not contain all the text seen in the browser.
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from webdriver_manager.chrome import ChromeDriverManager
chrome_options = Options()
chrome_options.add_argument("--headless")
browser = webdriver.Chrome(ChromeDriverManager().install(), options=chrome_options)
for url in URL_list:
browser.get(url)
content = browser.page_source
with open(DOWNLOAD_PATH + file_name + ".html", "w", encoding='utf-8') as file:
file.write(str(content))
browser.close()
but the html file I got doen't contain all the content I see in the browser in the same page. for example text I see on the screen is not found in the HTML file. only when I right click the page in the browser and "Save As" I get the full page.
URL example - https://www.camoni.co.il/411788/1Jacob
thank you