I recently finished a WebScrapping/Automation Zillow program for my boot camp. Instructor encouraged google as I was having issues with only being able to get the first couple of listing.
I stumbled upon this answer: Zillow web scraping using Selenium & BeautifulSoup
This worked well since instead of using bs4's find all method, I was able to get all of my listing neatly placed in a JSON file which was much easier to go through and complete the project. I only recently learned about regex and the re module on python and I was wondering if someone can explain how this code worked to help me retrieve the the nicely listed JSON from the get response and if this would work for other websites?
Code was:
self.data = json.loads(re.search(r'!--(\{"queryState".*?)-->', self.response.text).group(1))
- What arguments was taken account for on the
json.loads
? - How did the oddly written
!--({"queryState".*?)-->
work? - What is the purpose of the
.group(1)
?
I hate just copy and pasting but somehow this worked like magic and Id like to know how to replicate this for future projects. Sorry if this is loaded but the re.search documentation wasn't as helpful as I thought.