0

So, i have this code that helps me to download all files in URL, but everyday there is a new file, how i can avoid downloading all the files in the url?

for link in soup.select("a[href$='v2.pdf']"):
    filename = os.path.join(folder_location,link['href'].split('/')[-1])
    with open(filename, 'wb') as f:
        f.write(requests.get(urljoin(url,link['href'])).content)

1 Answers1

0

From this question How do I list all files of a directory? you can list all of the current files and directories using this code

from os import listdir
from os.path import isfile, join
mypath = './'
files = [f for f in listdir(mypath) if isfile(join(mypath, f))]

after you have gotten each of the file in a list you can then do a check for each file when downloading each of the files

for link in soup.select("a[href$='v2.pdf']"):
    filename = os.path.join(folder_location,link['href'].split('/')[-1])
    if (not filename in files):
        with open(filename, 'wb') as f:
            f.write(requests.get(urljoin(url,link['href'])).content)
Norton409
  • 109
  • 5