This is my first post so I apologize if it is a duplicate but I could not find an answer relevant to mine. If there is one please let me know and I will check it out.
I am attempting to scrape a website(below) to find Berkeley rent ceiling, the trouble I'm having is I cannot seem to figure out how to insert an address into the search box and scrape the info from the next page. In the past the URLs I've worked with change with search input, but not on this website. I thought my best bet would be using bs4
to scrape the info and request.session
and requests.post
to get to each subsequent address.
#Berkeley Rent Scrape
from bs4 import BeauitfulSoup
import sys
import requests
import openpyxl
import pprint
import csv
#wb = openpyxl.load_workbook('workbook.xlsx', data_only=True)
#sheet = wb.get_sheet_by_name('worksheet')
props_payload={'aspnetForm':'1150 Oxford St'}
URL = 'http://www.ci.berkeley.ca.us/RentBoardUnitSearch.aspx'
s = requests.session()
p = s.post(ULR, data = props_payload)
soup = BeauitfulSoup(p.text)
data = soup.find_all('td', class="gridItem")
UPDATE How do you get the info from the new webpage once the post has been sent? Or in other words, what is framework for using a request.post then a request.get or bs4 scrape when the URL does not change?
I was thinking it would look something like this, but I'm sure I need a GET request somewhere in there but don't understand how sessions work when the URL doesn't change.
I will be exporting the info to a csv file and to a excel sheet, but I can deal with that later. Just want to get the meat out of the way.
Thank you for any help!