I'm trying to write a script which will iterate through a list of landing page urls from a csv file, append all PDF links on the landing page to a list, and then iterate through the list downloading the PDFs to a specified folder.
I'm a bit stuck on the last step- I can get all the PDF urls, but can only download them individually. I'm not sure how best to amend the directory address to change with each url to ensure each has its own unique file name.
Any help would be appreciated!
from bs4 import BeautifulSoup, SoupStrainer
import requests
import re
#example url
url = "https://beta.companieshouse.gov.uk/company/00445790/filing-history"
link_list = []
r = requests.get(url)
soup = BeautifulSoup(r.content, "lxml")
for a in soup.find_all('a', href=True):
if "document" in a['href']:
link_list.append("https://beta.companieshouse.gov.uk"+a['href'])
for url in link_list:
response = requests.get(url)
with open('C:/Users/Desktop/CompaniesHouse/report.pdf', 'wb') as f:
f.write(response.content)