How to only download new files?

Question

So, i have this code that helps me to download all files in URL, but everyday there is a new file, how i can avoid downloading all the files in the url?

for link in soup.select("a[href$='v2.pdf']"):
    filename = os.path.join(folder_location,link['href'].split('/')[-1])
    with open(filename, 'wb') as f:
        f.write(requests.get(urljoin(url,link['href'])).content)

Have you tried to check if `filename` exists before downloading it again? — OneCricketeer, Dec 07 '20 at 22:22

score 0 · Accepted Answer · answered Dec 07 '20 at 22:25

0

From this question How do I list all files of a directory? you can list all of the current files and directories using this code

from os import listdir
from os.path import isfile, join
mypath = './'
files = [f for f in listdir(mypath) if isfile(join(mypath, f))]

after you have gotten each of the file in a list you can then do a check for each file when downloading each of the files

for link in soup.select("a[href$='v2.pdf']"):
    filename = os.path.join(folder_location,link['href'].split('/')[-1])
    if (not filename in files):
        with open(filename, 'wb') as f:
            f.write(requests.get(urljoin(url,link['href'])).content)

answered Dec 07 '20 at 22:25

Norton409

109
5

If it works, I am gonna research a bit. Have been using `if filename not in files` my entire life! :) – Hamza Dec 08 '20 at 00:07
Either way works in this situation, Python is like that sometimes – Norton409 Dec 08 '20 at 02:53
i thought it was working but, it keeps downloading all the files everytime i run the script – Angel Serrano G Jan 15 '21 at 21:31

How to only download new files?

1 Answers1