-2

Looking to click the download as pdf button on this site: https://www.goffs.com/sales-results/sales/december-nh-sale-2021/1

The reason I can't just scrape the download link or just manually download it is that there are multiple of these sites like:

https://www.goffs.com/sales-results/sales/december-nh-sale-2021/2

https://www.goffs.com/sales-results/sales/december-nh-sale-2021/3

And I want to loop through all of them and download each as a pdf.

Current code: import urllib.request from requests import get from bs4 import BeautifulSoup

url = "https://www.goffs.com/sales-results/sales/december-nh-sale-2021/1"

request = urllib.request.Request(url)
response = urllib.request.urlopen(request)
cdtmh
  • 23
  • 4
  • https://stackoverflow.com/questions/37164675/clicking-button-with-requests – Andrew Ryan May 25 '22 at 20:51
  • technically, looks like you dont need to click a button. just search source code and download pdf link. for example, for page 2: https://www.goffs.com/GoffsCMS/_Sales/354/2.pdf – rv.kvetch May 25 '22 at 20:52

1 Answers1

1

This code should get the link to the pdf:

from urllib.request import *
url = "https://www.goffs.com/sales-results/sales/december-nh-sale-2021/{}".format("1")

request = Request(url)
response = urlopen(request)
content = response.read().decode().split('<a href="https://www.goffs.com/GoffsCMS/_Sales/')
content = content[1].split('"')
content = content[0]
output = 'https://www.goffs.com/GoffsCMS/_Sales/'+content
print(output)
MrGreenyboy
  • 26
  • 1
  • 4
  • nice. the other possible optimization I could see is: given that each pdf link has same base url and same format for link (e.g. `{page}.pdf`) - it acutally might not be worth it to even make the HTTP request. That is, you could probably string format it based on input page # alone. – rv.kvetch May 25 '22 at 21:16