Pages have the potential to change so drastically that building a very "smart" scraper might be pretty difficult; and if possible, the scraper would be somewhat unpredictable, even with fancy techniques like machine-learning etcetera. It's hard to make a scraper that has both trustworthiness and automated flexibility.
Maintainability is somewhat of an art-form centered around how selectors are defined and used.
In the past I have rolled my own "two stage" selectors:
(find) The first stage is highly inflexible and checks the structure of the page toward a desired element. If the first stage fails, then it throws some kind of "page structure changed" error.
(retrieve) The second stage then is somewhat flexible and extracts the data from the desired element on the page.
This allows the scraper to isolate itself from drastic page changes with some level of auto-detection, while still maintaining a level of trustworthy flexibility.
I frequently have used xpath selectors, and it is really quit surprising, with a little practice, how flexible you can be with a good selector while still being very accurate. I'm sure css selectors are similar. This gets easier the more semantic and "flat" the page design is.
A few important questions to answer are:
What do you expect to change on the page?
What do you expect to stay the same on the page?
When answering these questions, the more accurate you can be the better your selectors can become.
In the end, it's your choice how much risk you want to take, how trustworthy your selectors will be, when both finding and retrieving data on a page, how you craft them makes a big difference; and ideally, it's best to get data from a web-api, which hopefully more sources will begin providing.
EDIT: Small example
Using your scenario, where the element you want is at .content > .deal > .tag > .price
, the general .content .price
selector is very "flexible" regarding page changes; but if, say, a false positive element arises, we may desire to avoid extracting from this new element.
Using two-stage selectors we can specify a less general, more inflexible first stage like .content > .deal
, and then a second, more general stage like .price
to retrieve the final element using a query relative to the results of the first.
So why not just use a selector like .content > .deal .price
?
For my use, I wanted to be able to detect large page changes without running extra regression tests separately. I realized that rather than one big selector, I could write the first stage to include important page-structure elements. This first stage would fail (or report) if the structural elements no longer exist. Then I could write a second stage to more gracefully retrieve data relative to the results of the first stage.
I shouldn't say that it's a "best" practice, but it has worked well.