0

I'm dealing with a problem that I can't wrap my head around and could use your help and expertise.

I have a textbox that allows the user to search for another user by a combination of name criterias listed below:

  • <first name><space><last name> (John Smith)
  • <last name><comma><space|nospace><first name> (Smith, John) or (Smith,John)
  • Either starting portion of first name or last name (in this case, I do a search against both the first and last name columns) (Smith), (John), (Sm), or (Jo)

Issue: There are quite a few users who have a space in their last name, if someone searches for them, they may only enter "de la".

Now in this scenario, since there is a space between the words, the system will assume that the search criteria is first name starts with "de" and last name with "la". The system will work as expected if the user typed "de la," because now the input contains a comma, and the system will know for sure that this search is for a last name but I have to assume that not everyone will enter a comma at the end.

However the user probably intended only to search for someone with last name starting with "de la".

Current options I have a few options in mind and could use your help in deciding which one would you recommend. And PLEASE, feel free to add your suggestions.

  • User training. We can always create help guides/training material to advise the users to enter a comma at the end if they're searching for a last name containing a space. I don't like this approach because the user experience isn't smart/intuitive anymore and most of the users won't read the help guides.
  • Create 2 different text boxes (for first name and last name). I'm not a fan of this approach either; the UI just won't look and feel the same and will prove inconvenient to the users who just want to copy/paste a name from either Outlook or elsewhere (without having to copy/paste first/last name separately).

  • Run the search criteria with first, and then in addition, run a search for people with spaced last name and append both results to the return value. This might work, but it'll create a lot of false positives and cause extra load on the server. E.g. search for "de la" will return Lance, Devon (...) and "De La Cruz, John" (...).

I'd appreciate any type of feedback you can shed on this issue; your experiences, best practices, or the best one, some code snippets of something you've worked with related to this scenario.

Application background: Its ASP.NET (4.0) WebAPI service written in C#; its consumed by a client sitting on a different server.

hsandhar
  • 19
  • 7
  • Regular expressions would be a good start. – Evan Mulawski May 28 '15 at 22:36
  • Can you give me an example where regex will help in this scenario? I'm already using regex to check the space/commas etc and am open to incorporating any new ideas. – hsandhar May 28 '15 at 22:39
  • You could use something like `(([a-zA-Z-']+)\s?([a-zA-Z-' ]+)+|([a-zA-Z-' ]+)(\s?,\s?|\s)([a-zA-Z-' ]+))`. Play around with it on regexr.com. – Evan Mulawski May 28 '15 at 22:51

2 Answers2

1

I've used this technique for a number of years and I like it.

Lose the comma, no one will use it. If there is not a space, search for first OR last. If there is a space, search for first AND last. This code works very well for partial name searches, i.e. "J S" finds Jane Smith and John Smith. "John" will find "John Smith" and "Anne Johnson". This should give you a pretty good starting point to get as fancy as you want with your supported queries.

public IEnumerable<People> Search(string query, int maxResults = 20)
{
    if (string.IsNullOrWhiteSpace(query))
    {
        return new List<People>();
    }

    IEnumerable<People> results;

    var split = query.Split(new[] { ' ' }, StringSplitOptions.RemoveEmptyEntries);

    if (split.Length > 1)
    {
        var firstName = split[0];
        var lastName = string.Join(" ", split.Skip(1));

        results = PeopleRepository.Where(x => 
            x.FirstName.StartsWith(firstName, StringComparison.OrdinalIgnoreCase) &&
            x.LastName.StartsWith(lastName, StringComparison.OrdinalIgnoreCase));
    }
    else
    {
        var search = split[0];
        results = PeopleRepository.Where(x => 
            x.FirstName.StartsWith(search, StringComparison.OrdinalIgnoreCase) ||
            x.LastName.StartsWith(search, StringComparison.OrdinalIgnoreCase));
    }

    return results.Take(maxResults);
}
ManOVision
  • 1,853
  • 1
  • 12
  • 14
  • You can handle `LastName FirstName` searches using an extension I found [here](https://stackoverflow.com/questions/1779129/how-to-take-all-but-the-last-element-in-a-sequence-using-linq?utm_medium=organic&utm_source=google_rich_qa&utm_campaign=google_rich_qa). ```...var lastName = ... var altFirstName = split[split.Count() - 1] var altLastName = string.Join(" ", split.AllButLast())... .Where(... (x.FirstName.... && x.LastName...) || (x.FirstName.StartsWith(altFirstName... x.LastName.StartsWith(altLastName...) ``` Won't work well with complex first names that have spaces. – TabsNotSpaces May 24 '18 at 21:38
0

Maybe the point is that you should index your user data in order to look for it efficiently.

For example, you should index first and last names without caring about if they're first or last names. You want to search people, why end-user should care about search term order?

The whole index can store user ids on sets specialized by names (either first or last names). If user ids are integers, it would be something like this:

John => 12, 19, 1929, 349, 1, 29
Smith => 12, 349, 11, 4
Matias => 931, 45
Fidemraizer => 931

This way user inputs whatever and you don't care anymore about ordering: if user types "John", you will show all users where their ids are in the John set. If they type both John Smith, you'll need to intersect both John and Smith sets to find out which user ids are in both sets, and so on.

I don't know what database technology you're currently using, but both SQL and NoSQL products can be a good store for this, but NoSQL will work better.

Matías Fidemraizer
  • 63,804
  • 18
  • 124
  • 206