Scraping a webpage with Python but unsure how to deal with a static(?) URL

Question

I am trying to learn how to pull data from this url: https://denver.coloradotaxsale.com/index.cfm?folder=auctionResults&mode=preview

However, the problem is that the URL doesn't change when I am trying to switch pages so I am not exactly sure how to enumerate or loop through it. Trying to find a better way since the webpage has 3 thousand datapoints of sales.

Here is my starting code it is very simple but I would appreciate any help that can be given or any hints. I think I might need to change to another package but I am not sure which one maybe beautifulsoup?

import requests 
url = "https://denver.coloradotaxsale.com/index.cfm?folder=auctionResults&mode=preview"

html = requests.get(url).content
df_list = pd.read_html(html,header = 1)[0]
df_list = df_list.drop([0,1,2]) #Drop unnecessary rows

Does this answer your question? [Scrape a dynamic website](https://stackoverflow.com/questions/206855/scrape-a-dynamic-website) — gre_gor, Aug 31 '22 at 21:39

score 1 · Accepted Answer · answered Aug 31 '22 at 21:39

To get the data from more pages you can use this example:

import requests
import pandas as pd
from bs4 import BeautifulSoup


data = {
    "folder": "auctionResults",
    "loginID": "00",
    "pageNum": "1",
    "orderBy": "AdvNum",
    "orderDir": "asc",
    "justFirstCertOnGroups": "1",
    "doSearch": "true",
    "itemIDList": "",
    "itemSetIDList": "",
    "interest": "",
    "premium": "",
    "itemSetDID": "",
}

url = "https://denver.coloradotaxsale.com/index.cfm?folder=auctionResults&mode=preview"


all_data = []

for data["pageNum"] in range(1, 3):  # <-- increase number of pages here.
    soup = BeautifulSoup(requests.post(url, data=data).content, "html.parser")
    for row in soup.select("#searchResults tr")[2:]:
        tds = [td.text.strip() for td in row.select("td")]
        all_data.append(tds)

columns = [
    "SEQ NUM",
    "Tax Year",
    "Notices",
    "Parcel ID",
    "Face Amount",
    "Winning Bid",
    "Sold To",
]

df = pd.DataFrame(all_data, columns=columns)

# print last 10 items from dataframe:
print(df.tail(10).to_markdown())

Prints:

	SEQ NUM	Tax Year	Parcel ID	Face Amount	Winning Bid	Sold To
96	000094	2020	00031-18-001-000	$905.98	$81.00	00005517
97	000095	2020	00031-18-002-000	$750.13	$75.00	00005517
98	000096	2020	00031-18-003-000	$750.13	$75.00	00005517
99	000097	2020	00031-18-004-000	$750.13	$75.00	00005517
100	000098	2020	00031-18-007-000	$750.13	$76.00	00005517
101	000099	2020	00031-18-008-000	$905.98	$84.00	00005517
102	000100	2020	00031-19-001-000	$1,999.83	$171.00	00005517
103	000101	2020	00031-19-004-000	$1,486.49	$131.00	00005517
104	000102	2020	00031-19-006-000	$1,063.44	$96.00	00005517
105	000103	2020	00031-20-001-000	$1,468.47	$126.00	00005517

Thanks this solved my issue. I'll just use this to continue trying a couple more things. — James Ho, Aug 31 '22 at 21:46

Scraping a webpage with Python but unsure how to deal with a static(?) URL

1 Answers1