I am trying to scrape content from shopping sites then save it on my database in table Product. Scraping such content require to know the DOM structure of each site. Not only DOM Structure, but also the hierarchy of categories in the menu.
There are many solutions to achieve that by setup a configuration for each site, then look for specific html elements that contains (ex product name, price ,model,...) using regx, XPath or css selectors.
Is there any solution to avoid setup configuration for each site and scrape the product properties automatically?
There is a similar solutions that deal with news like Readability which looks for sequence of <p>
tag and images. It is easier for news due the similarity between news site and the simple structure,