As part of a contact management system I have a large database of names. People frequently edit this and as a result we run into issues of the same person existing in different forms (John Smith and Jonathan Smith). I looked into word similarity but it's easy to think of name variations which are not similar at all (Richard vs Dick). I was wondering if there was a list of common English first name variations that I could use to detect and correct such errors.
Asked
Active
Viewed 3,191 times
5
-
possible duplicate of [Converting user nickname to formal first name in Python](http://stackoverflow.com/questions/13615789/converting-user-nickname-to-formal-first-name-in-python) – Luke Jul 30 '15 at 17:42
2 Answers
3
I would crawl all wikipedia pages (there is an available dump of wikipedia data) on people names, e.g., http://en.wikipedia.org/wiki/Teresa (from http://en.wikipedia.org/wiki/Category:English_given_names), and create an index that you can use to suggest people correct forms (you will rank them by the number of first name variants in your database). Unfortunately I do not know. such a database.

Skarab
- 6,981
- 13
- 48
- 86