import os
import requests
from bs4 import BeautifulSoup
desktop = os.path.expanduser("~/Desktop")
url = 'https://www.ici.org/research/stats'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
excel_files = soup.select('a[href*=xls]')
for each in excel_files:
if 'Supplement: Worldwide Public Tables' in each.text:
link = 'https://www.ici.org' + each['href']
filename = each['href'].split('/')[-1]
if os.path.isfile(desktop + '/' + filename):
print ('*** File already exists: %s ***' %filename)
continue
resp = requests.get(link)
output = open(desktop + '/' + filename, 'wb')
output.write(resp.content)
output.close()
print ('Saved: %s' %filename)
I am new to web scraping and I want to automatically download from a list of websites a pdf document.
This document is updated on a monthly basis and the url changes on the website. e.g https://fundcentres.lgim.com/fund-centre/OEIC/Sterling-Liquidity-Fund I want to download the 'factsheet' pdf document from the above website. I think the ideal way would be the code to press the factsheet and saves it to a location on the drive. The difficulty is that the url changes!