1

I want to show similar products so called variants for the a product. Currently I am doing it as below:

public IList<Product> GetVariants(string productName)
{
    EFContext db = new EFContext();  //using Entity Framework
    return db.Products
           .Where(product = > product.ProductName == productName)
           .ToList();
}

But , this results into exact match, that is the current product itself. I am thinking to use Levenshtein Distance as a basis to get the similar products. But , before that I want to check what majority developers do for getting variants?

  1. Is it good to use Levenshtein Distance ? Is it used in industry for this purpose?
  2. Do I have to add another table in database showing the variants for the product while adding the product to database?
Bhushan Firake
  • 9,338
  • 5
  • 44
  • 79

1 Answers1

1

I used the Jaro-Winkler distance effectively to account for typos in one system I wrote a while back. IMO, It's much better than a simple edit distance calculation as it can account for string lengths fairly effectively. See this question on SO for open source implementations.

I ended up writing it in C# and importing it into SQL server as a SQL CLR function, but it was still relatively slow. It worked in my case mostly because such queries were executed infrequently (100-200 in a day).

If you expect a lot of traffic, you'd have to build an index to make these lookups faster. One strategy for this would be to periodically compute the distance between each pair of products each pair of products and store this in an index table if the distance exceeds a certain threshold. To reduce the amount of work that needs to be done, you can run this only once or twice a day and you can limit this to only new or modified records since the last run. You can then look up similar products and order by distance quickly.

Community
  • 1
  • 1
p.s.w.g
  • 146,324
  • 30
  • 291
  • 331