-1

Using the following code, which is "working" in terms of extraction, the output is overwriting each new page in the main html output 'file.' I'm new to this and am sure it's a silly coding error but I'm just not seeing it.

In other words, it is working through the pages and extracting the information but each time it completes a page, it overwrites what is already in the html so at any given time I have only p. 2 or p. 16, etc. I need it to either keep adding to the page or create an html file for each page (I think the latter is preferred?).

Any help would be most appreciated.

This is just one part of a larger script, but I'm trying to make sure each part works properly before running the whole thing.

Thanks for your time!

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.webdriver.common.action_chains import ActionChains
from time import sleep
import os

allpages=[]
for i in range(2,1575): *** the main page is a different url so starting on p. 2
    allpages.append("url here"+str(i))

completedlist=[]

for eachpage in allpages[0:2]: *** just testing; will change to :1575
#options = Options()
options.headless = True
driver = webdriver.Chrome(options=options, executable_path='mypath')
driver.get(eachpage)
print ('Headless Chrome Initialized: '+eachpage)

with open("./capture/filenamehere"+str(i)+".html", "w") as f:
    f.write(driver.page_source)

completedlist.append(eachpage)
Mere
  • 1
  • 1

1 Answers1

0

You are opening file in writing mode therefore your output get overwrite every time. Change 'w' in open with 'a' which means append mode, now your file will not be get overwrite the new content will be appended on the end.

Usama Tariq
  • 169
  • 5
  • Thank you so much! I knew it was something silly I was missing. You have made my day. – Mere Apr 03 '21 at 22:12
  • I did, but it apparently isn't showing because I don't have enough reputation points yet. Sorry about that. I'll work on getting some rep points so it will show up. – Mere Apr 04 '21 at 13:58