I am completing a Masters in Data Science. I am working on a Text Mining assignment. In this project, I intend to download several PDFs from a website. In this case, I want to scrape and save the document called "Prospectus".
Below is the code which I am using in Python. The prospectus which I wish to download is show in screenshot below. However, the script returns different documents on the web page. Is there something which I need to change within my script?
import os
import requests
from urllib.parse import urljoin
from bs4 import BeautifulSoup
url = "https://www.ishares.com/us/products/239726/ishares-core-sp-500-etf"
# If there is no such folder, the script will create one automatically
folder_location = r'.\Output'
if not os.path.exists(folder_location): os.mkdir(folder_location)
response = requests.get(url)
soup = BeautifulSoup(response.text, "html.parser")
for link in soup.select("a[href$='.pdf']"):
# Name the pdf files using the last portion of each link which are unique in this case
filename = os.path.join(folder_location, link['href'].split('/')[-1])
with open(filename, 'wb') as f:
f.write(requests.get(urljoin(url, link['href'])).content)