0

I'm using Python to scrape a real estate listings webpage. Currently I'm navigating to the site (https://www.mlslistings.com/), typing in a zip code into the search box, clicking 'search', and copy-pasting the resulting url into my script to be scraped.

Ultimately I'd like to loop over a list of zip codes, obtaining the url for each page to be scraped. So first I need to figure out how to do this once -- starting from the homepage, I'm hoping to access the page that follows the entry of a zip code into the search box.

I'm under the impression that this can be done using requests.post() together with a 'data' dictionary that points to the field I wish to populate and the corresponding input text.

I've tried

url = 'https://www.mlslistings.com'
data = {'input id': '94618'}
page = requests.post(url, data=data)
response = page.text

Here, I've used the key 'input id' based on inspecting the homepage and finding the search bar shows elements

<input id="searchText" type="text" name="searchText" class="form-control font-size-lg" placeholder="California City, Zip, Address, School District, MLS #" data-type="search" maxlength="300" aria-label="California City, Zip, Address, School District, MLS #" autocomplete="off">

I expected the resulting response object to be the html document containing the information displayed on the page obtained when I manually search for '94618' in my browser. Instead, it looks like the html document of the homepage itself.

Am I incorrectly naming the key in the data dictionary, or going wrong somewhere else? Any help would be greatly appreciated.

Michael Boles
  • 369
  • 5
  • 15
  • Everything is not as simple as you expect. Check [this answer](https://stackoverflow.com/a/55919705/10824407), I've described some basic steps. – Olvin Roght May 14 '19 at 21:10
  • I'm stuck on step 2: inspecting the search bar that I wish to post text into does not reveal any text similar to `"href=javascript:downloadnow(1);"`. – Michael Boles May 15 '19 at 00:15
  • Cause there's no :). Button search just execute submit on parent form. Check all inputs name and values inside this form and this is your post data. Or just execute next code in browser console: `Array.from(document.getElementById("homesearchbar").getElementsByTagName("input")).forEach(function(e) {console.log(e.name + '=' + e.value)})` – Olvin Roght May 15 '19 at 22:19
  • It's strange, that you've got only 4 items, cause personally I get 5. One of this items is `__RequestVerificationToken`, which should be very important to include into post data. You should continue your research ;) *Btw, isn't it easier to ask website owner for API?* – Olvin Roght May 17 '19 at 01:07
  • Apologies for the deleted comment -- I intended to rephrase my question but you'd already responded. I do in fact see `__RequestVerificationToken`. I added it to my data dictionary and re-ran `page = requests.post(url, data=data)`, but again cannot find anything useful in the resulting `page.text` object. I was under the impression that this is how people pull data from web pages. It sounds like familiarizing myself with APIs may be a better approach. Is there a resource you'd recommend that will help me to understand and use an API? Thanks so much for your help. – Michael Boles May 17 '19 at 03:04
  • First of all, you should check does this website have have an API. *Scraping data from websites without agreement with website owner is forbidden.* – Olvin Roght May 17 '19 at 07:32
  • APIs like Zillow's seem to offer property-level data (price estimate, beds, baths, etc.) only following a query on that address, not for all homes within a given area. This seems like a big limitation on the utility of such a service. As for the permissibility of scraping, fortunately, the terms of service for MLS Listings (https://www.mlslistings.com/more/terms), in contrast to other leading real estate platforms (e.g., Zillow, Trulia, Redfin) does not mention any prohibition of web scraping or local storage of data. So it seems like scraping the MLS web page is my best bet for this project. – Michael Boles May 19 '19 at 17:29
  • Then continue to research and you'll get success. In first comment I've mentioned my answer to similar question which should help you to emulate request which will be accepted by server. You can skip first section and start to read from `BUT THAT'S NOT ALL.`. – Olvin Roght May 20 '19 at 01:24

0 Answers0