3

while I try to loop the 'urlpage' in sequential ascending trend, this only gives me the 0021 zip file and this file only after firefox asks me to download. What is wrong with my code and how can I make it loop to open all url from the serial numbers in my loop?

import urllib.request
from bs4 import BeautifulSoup
from selenium import webdriver
import time
import pandas as pd
import os

j=''
k=1
while k < 4:
    j='002'+ str(k)
    print(str(j))
    if k>0:
        urlpage = 'https://www150.statcan.gc.ca/n1/tbl/csv/3210'+j+'-eng.zip' 
        print(urlpage)
    k+=1
        # run firefox webdriver from executable path of your choice
    driver = webdriver.Firefox()
        # get web page
    driver.get(urlpage)
        # execute script to scroll down the page
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);var lenOfPage=document.body.scrollHeight;return lenOfPage;")
        # sleep for 30s
    time.sleep(30)
    driver.quit()
0021
https://www150.statcan.gc.ca/n1/tbl/csv/32100021-eng.zip
mik1904
  • 1,335
  • 9
  • 18
code dummy
  • 31
  • 1
  • The url points to a zip file form the look of it. That file must be downloaded somewhere on your device before you can do anything with it. You can't execute a javascript on .zip file as if it was a browser. Are you sure you have the correct url? – Buckeye14Guy Jun 28 '19 at 15:32
  • Yes, the URL is a download page so when I open it, it pops a download option. The URL is part of my loop. – code dummy Jun 28 '19 at 15:39
  • Right. You have a `.get` method on the driver so all you are telling it "go to this page". When it does that all it sees is a zip file which should be downloaded. If you do not have auto download available then it will not prompt you for confirmation – Buckeye14Guy Jun 28 '19 at 15:41
  • I think you need to set the download options before calling `.get` on the driver. Take a look at this [Post](https://stackoverflow.com/questions/25251583/downloading-file-to-specified-location-with-selenium-and-python) – Buckeye14Guy Jun 28 '19 at 15:45
  • Thanks for the post, I've tried it. But it still only gives me one zip file lol...sadly... – code dummy Jun 28 '19 at 17:46
  • the loop on the website url still not working as it should be for some reason... – code dummy Jun 28 '19 at 17:47

1 Answers1

0

So I do not understand why you want to scroll down on that specific urlpage. You cannot scroll down on a zip file. Your link takes you directly to a zip file which must be downloaded. I did something similar once with a chromedriver so perhaps this will help. I am not sure if it will be any different with the FireFox driver (well at least there won't be any chrome_options)

Python = 3.6 and selenium.__version__ = 3.14.1

import time
import zipfile
import pathlib
from selenium import webdriver

cwd = pathlib.Path.cwd()
chrome_driver = cwd / 'chromedriver.exe'
download_folder = cwd / 'downloads' # make sure this folder exists

# You could use an f"" string on urlpage
j=''
k=1
while k < 4:
    j='002'+ str(k)
    print(str(j))

    if k>0: # may not be necessary
        urlpage = 'https://www150.statcan.gc.ca/n1/tbl/csv/3210'+j+'-eng.zip' 
        print(urlpage)

    k+=1
    # run chrome instead - the only reason for this is because I used it before :)
    options = webdriver.ChromeOptions()
    options.add_experimental_option("prefs", {"download.default_directory": str(download_folder)})
    driver = webdriver.Chrome(str(chrome_driver), chrome_options=options)

    # get web page
    driver.get(urlpage)

    # Your page is not a WEBPAGE. it is a ZIP file. You cannot scroll anywhere on a zip file
    # driver.execute_script("window.scrollTo(0, document.body.scrollHeight);var lenOfPage=document.body.scrollHeight;return lenOfPage;")

    # sleep for 30s
    time.sleep(30)

    # you can unzip here if you want
    downloaded_file = urlpage.split('/')[-1]
    directory_to_unzip_to = download_folder / downloaded_file.split('.')[0]
    zip_ref = zipfile.ZipFile(download_folder / downloaded_file, 'r')
    zip_ref.extractall(directory_to_unzip_to)
    zip_ref.close()

    driver.quit()

Output:

enter image description here

Buckeye14Guy
  • 831
  • 6
  • 12