1

Okay guys, I am at a stump here.

Basically the problem I have is that when people are searching for something they are mistyping the code as a lot of the time they involved multiple 0's, for example when searching for: K00000WEFLZ they are mistyping the 0's and then the product result is returning with nothing, I am basically looking to try and make it so that the search checks if the search contains a certain amount of 0's after the letter "K" (as K will always have say 10+ ID numbers and at least 4-5 0's) and if it does it replaces it with "*" during the searching operation and still allows you to find the product no matter how wrong they type the problem.

I am aware I will have to make a custom class and override the default for this (however a lot of this can not be accessed/is private) as the default search pattern can not be changed as this will change it for everyone and I only want it for this specific site.

I also can not use a wildcard at the start or the ending of this as it would match miles to many results as it has a huge catalog.

As far as I know this is the code that handles the search with the class for the default logic:

protected virtual IList<Product> CreateCustomCollection()
{
        var list = new List<Product>();

        switch (mode)
        {
            case ProductRepeaterMode.Search:

                if (Page.Request.QueryString["search"] != null && Page.Request.QueryString["search"].Length != 0)
                {
                    bool[] customString = new bool[5] { SearchCustomString1, SearchCustomString2, SearchCustomString3, SearchCustomString4, SearchCustomString5 };
                    IList<Product> results = Fabric.ObjectProvider.Get<IProductSearchCommand>().Search(Page.Request.QueryString["search"], out searchWords, IncludeSkus, IsPublicFacing, customString, CoreHttpModule.Session);

                    var retailOrder = WebStoreManager.CurrentOrder as IRetailOrder;
                    var accountOrder = WebStoreManager.CurrentOrder as IAccountOrder;

                    IList<Product> productsToRemove = new List<Product>();
                    IList<Product> productsToAdd = new List<Product>();

                    foreach (var product in results)
                    {
                        if (hideRestrictedProducts)
                        {
                            if (retailOrder != null)
                            {
                                if (!product.CanBePurchasedByRetailCustomer || product.AgentOnly)
                                    productsToRemove.Add(product);
                            }
                            else
                            {
                                if (accountOrder != null)
                                {
                                    var add = false;

                                    if (product.CanBePurchasedOnAccount)
                                        add = true;

                                    if (product.AgentOnly)
                                    {
                                        if (accountOrder.Agent != null)
                                            add = true;
                                        else
                                            add = false;
                                    }

                                    if (!add)
                                        productsToRemove.Add(product);
                                }
                            }
                        }

                        // Replace SKUs with lines
                        if (resolveSkusToLines)
                        {
                            var sku = product.Role as SkuProductRole;
                            if (sku != null)
                            {
                                productsToRemove.Add(product);
                                if (sku.Owner != null && sku.Owner.Product != null)
                                {
                                    var line = sku.Owner.Product;
                                    if (!results.Contains(line) && !productsToAdd.Contains(line))
                                        productsToAdd.Add(line);
                                }
                            }
                        }
                    }

                    foreach (Product product in productsToAdd)
                    {
                        results.Add(product);
                    }

                    foreach (Product product in productsToRemove)
                    {
                        results.Remove(product);
                    }

                    foreach (var result in results)
                        list.Add(result);
                }
                break;
        }
        return list;
    }
marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Hello World
  • 1,379
  • 4
  • 20
  • 41
  • I was considering possibly breaking up the search string into a character array and then checking the database for the most similar string to the character array, however i feel like this would be extremely inefficient – Hello World Jun 08 '12 at 10:58
  • I could also make it so if there are say... 2+ 0's I replace them all a wildcard. – Hello World Jun 08 '12 at 11:06
  • I am looking into using regex to detect and replace it – Hello World Jun 08 '12 at 11:11
  • You might want to consider using the Levenshtein distance algorithm for comparing strings for similarity? You'll find an SO question on this [here](http://stackoverflow.com/questions/2344320/comparing-strings-with-tolerance). – Anders Gustafsson Jun 08 '12 at 11:43
  • The only issue with this is that it would have to be used for every single product which isn't efficient at all, thank you for the comment though. – Hello World Jun 08 '12 at 11:54
  • I would suggest using a spell checker like http://www.wintertree-software.com/spell-check/csharp/index.html to get suggested works and use them... Possibly show the user "did you mean xyz"... – Peter Ritchie Jun 08 '12 at 13:09
  • I can't do that as it's only for one clients site and they've asked for it as a custom method, I am going to be using Regex :) – Hello World Jun 08 '12 at 13:22

1 Answers1

2

fuzzy logic, gotta love it. The way I would do this, which doesn't say a lot about how it really should be done but I'll give it my own best attempt, would be to build a regular expression out of the search string itself.

Treat the regex builder like a custom built zipping operation. Start with your character array and build the regex search from that, any time you find 2 identical characters in a row, replace the second (and ignore any others beyond the second) with the '+' character, then run the resulting search using regex rather than exact string matching.

K00000WEFLZ would turn into K0+WEFLEZ, and match K, 1 or more 0's, WEFLEZ. The algorithm would need to do this for ANY repeated characters, so it might make it a little silly. Something like KK000WWLEFF22 would be come K+0+W+LEF+2+. Not that much better a search string, and might match a lot of things you didn't want...but effective. Or you could limit it to only replace 0, or 0's. etc etc...whatever ends up working best.

The other way I would recommend it would be live filtering. But the usefulness of that is more dependent on the expected normal functionality. Will it be more common for the user to type the value in, or more common for them to copy/paste it from elsewhere. In the second case, live filtering is utterly useless. Otherwise...refilter the list on every keyPress or TextChanged event. At least then they might get an idea of when the entire list disappeared because the entered an extra 0.

Edit - Add Code sample

private string RegStringZipper(string searchString)
    {
        StringBuilder sb = new StringBuilder();
        char lastChar = new char();
        bool plusFlag = false;
        foreach (char c in searchString)
        {
            if (sb.Length == 0)
            {
                sb.Append(c);
                lastChar = c;
            }
            else
            {
                if (lastChar == c)
                {//we have a repeating character
                    //Note: Here is also where if you only wanted to filter a specific character, like 0, you would check for it.
                    if (!plusFlag)
                    {//we have not already added the +
                        sb.Append('+');
                        plusFlag = true;
                    }
                    //else do nothing, skip the characer
                }
                else
                {
                    sb.Append(c);
                    plusFlag = false;
                    lastChar = c;
                }
            }
        }
        return sb.ToString();
    }

as for where I would fit it into your code... It really depends on how that searching function actually works, its not something I've ever played with before...Speaking of which...if it works they way it looks like it might work, trade out '+' for '*' in the above code....

if (Page.Request.QueryString["search"] != null && Page.Request.QueryString["search"].Length != 0)
            {
                bool[] customString = new bool[5] { SearchCustomString1, SearchCustomString2, SearchCustomString3, SearchCustomString4, SearchCustomString5 };
                string SearchString = RegStringZipper(Page.Request.QueryString["search"]);
                //please note, given that I dont know what FabricProvider.Search works on, I dont actually know that this works as intended.
                IList<Product> results = Fabric.ObjectProvider.Get<IProductSearchCommand>().Search(SearchString, out searchWords, IncludeSkus, IsPublicFacing, customString, CoreHttpModule.Session);
Nevyn
  • 2,623
  • 4
  • 18
  • 32
  • 1
    fun question for follow up: any suggestions on how one might sort the RegEx results if there were more than one match (i.e. a relevancy score)? – nicholas Jun 08 '12 at 15:39
  • you know what....I have no idea. maybe compare the string lengths of the original search string vs the provided results...but if these are product IDs then logic says they are probably all going to be the same length anyway. Getting 'relevance' would be pretty complicated. Anyone else have an idea for that one, I'd like to hear it actually. – Nevyn Jun 08 '12 at 15:44
  • This is the solution. You want to match a pattern. Regular Expression is built on fuzzy logic. – Security Hound Jun 08 '12 at 19:07