2

So I've searched fuzzy searching, the Levenshtein Distance Algorithm and I'm not sure if either are a true fit for what I'm doing. Please let me know your thoughts, if any...

How can I take a user's full name, and generate a list of similar names? I want to prevent a user from creating multiple accounts in an application by providing a "Hey are you sure none of these are you" as a final step before account creation.

I've found this article, but it's entirely SQL-based (http://stackoverflow.com/questions/988050/matching-records-based-on-person-name)

I'm using c# / Linq, SqlServer.

Thanks for your time!

Dori
  • 915
  • 1
  • 12
  • 20
Mark
  • 21
  • 1
  • My recommendation would be to look at `SOUNDEX()`, but that's also a SQL solution which you seem against using. – Yuck Jul 19 '11 at 16:41
  • Yes, I'd prefer to keep it in c# – Mark Jul 19 '11 at 16:43
  • 1
    You could also just use the SQL answer that you've posted. Just create a stored procedure and call it from linq – Oskar Kjellin Jul 19 '11 at 16:43
  • What about adding "rules", so to speak? So if John Doe is entered,I strip spaces, non-alpha chars, end up with 'johndoe'. I apply a length "rule" saying the queried names need to be +/- x in length of 'johndoe'. I can also apply a # of similar character threshold say 80%. Any thoughts? – Mark Jul 19 '11 at 16:54
  • 2
    Just FYI, if you are implementing an application available to public, it is not a good idea displaying existing usernames due to security reason. – Win Jul 19 '11 at 17:11

2 Answers2

1

Here is a link to a SOUNDEX implementation in .NET:

http://www.codeproject.com/KB/recipes/soundex.aspx

I haven't used it but it seems to be rated well

slolife
  • 19,520
  • 20
  • 78
  • 121
0

If it were me, I would require an exact match on the last name, and then only try to guess variances of the first name. This would narrow down your field of work quite a bit.

Then, as you suggested in your comments, you could apply rules of +/- a few characters of the first name length as well as a threshold of say (80%) of the characters must match.

Also, you can then only look at first names that also match the first X characters as well, as most English name deviations will be after X number of characters.

Example:

  • John Doe
  • Johnny Doe
  • Johnathan Doe
landoncz
  • 1,997
  • 14
  • 15