So I am scraping 3 websites for their product's data, the websites are all big chains of supermarkets in my region, since all the supermarkets are in the same region they usually sell the same products.
I want to make one curated collection containing all the products in the region so that when I search 'CEREAL FLIPS DE CHOCOLATE 220 GR' in the collection it returns me that product for each of the supermarkets, the thing is since every company is independent of eachother, they keep their inventories separately so each of them gives a different input name to all of their products.
Example :
- Market 1. Cereal Flips Chocolate 220Gr
- Market 2. CEREAL FLIPS CHOCOLATE 220G
- Market 3. CEREAL FLIPS DE CHOCOLATE 220 GR
So lets say I make one curated collection that holds all the real names of the products and give it to the scrapers. The scraper of market 1 runs and sees
- Market1: Cereal Flips Chocolate 220Gr
and checks for the curatedCollection.
- CuratedCollection: CEREAL FLIPS DE CHOCOLATE 220 GR
Is there a way to match those nearly identical strings?.