I'm building a deal aggregator so I need a crawler that will extract data from some sites: price, discount, image, coordinates and name of deal of cource.
Do you know of any tutorials, ebooks or something that will help me? For image and coordinates and discount I have a solution and pattern:
- image: biggest image is always the main image of deal
- discount: discount is always a number between 50 and 99 and always has a "%" symbol
- coordinates: is always in decimal numbers so I get it with regex
How do I get the following items?
- Name of deal?
- Price?
Do you know of any data extraction algorithms that can be helpful?