This is similar to the question I had here. Which was answered perfectly. Now that I have something to work with what I am trying to do now is instead of having a url entered manually in to take data. I want to develop a function that will take in just the address, and zipcode and return the data I want.
Now the problem is modifying the url to get the correct url. For example
url = 'https://www.remax.com/realestatehomesforsale/25-montage-way-laguna-beach-ca-92651-gid100012499996.html'
I see that besides the address, state, and zipcode there is also a number that follows i.e. gid100012499996 which seems to be unique for each address. So I am not sure how to be able to achieve the function I want.
Here is my code:
import urllib
from bs4 import BeautifulSoup
import pandas as pd
def get_data(url):
hdr = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.3',
'Accept-Encoding': 'none',
'Accept-Language': 'en-US,en;q=0.8',
'Connection': 'keep-alive'}
request = urllib.request.Request(url, headers=hdr)
html = urllib.request.urlopen(request).read()
soup = BeautifulSoup(html,'html.parser')
foot = soup.find('span', class_="listing-detail-sqft-val")
print(foot.text.strip())
url = 'https://www.remax.com/realestatehomesforsale/25-montage-way-laguna-beach-ca-92651-gid100012499996.html'
get_data(url)
What I want to have is something like the above but instead get_data() will take in address, state, and zipcode. My apologies if this is not a suitable question for this site.