With the goal of extracting information in more readable format out of a search result on a web site, I am now very puzzled by what I am seeing.
I access the result page via the 'inspect' feature of Chrome:
To get a split pane where every element in the page rendering is reachable as it's HTML counterpart:
Now, I am interested in parsing specific tags with an attribute that has a "entry-price" substring in it.
As you can understand, every record of the cars found, has a price <span>
element in it, with the price info embedded in it.
I am making the case for the price, it is very similar for other properties of each and every record returned by the search.
This specific page has 86 results, and the <span>
elements with that specific data-testid
attribute value are also 86, at least in this view:
The 'interesting' thing is that when I saved the HTML of the page I could see far less tags with those characteristics above: actually, only 5. To reduce the margin for error, I just used the function 'view source' for the HTML page.
There, to my great surprise, a simple text search on 'entry-price' only returns 5 items!!
Here's the full link, if you want to try for yourself: https://www.willhaben.at/iad/gebrauchtwagen/auto/gebrauchtwagenboerse?ENGINE/FUEL=100001&WHEEL_DRIVE=3&EQUIPMENT=11&sort=3&CAR_MODEL/MAKE=1042&sfId=e7ce8b54-db41-419b-a7e2-edd5f23501eb&isNavigation=true&CAR_MODEL/MODEL=1774&rows=100&page=2&YEAR_MODEL_FROM=2018&YEAR_MODEL_TO=2021
(it's actually the 2nd page of a 186 total results, 100 results per page)
I'm wondering how is that possible? I cannot understand it at all.
BTW, the reason I tried by visualizing the source is that I had a small python
script in place - using BeautifulSoup
- to parse the saved HTML and extract what I needed. It worked just fine with another search, but this one is giving me extra headaches.