5

As part of a contact management system I have a large database of names. People frequently edit this and as a result we run into issues of the same person existing in different forms (John Smith and Jonathan Smith). I looked into word similarity but it's easy to think of name variations which are not similar at all (Richard vs Dick). I was wondering if there was a list of common English first name variations that I could use to detect and correct such errors.

Chris
  • 631
  • 1
  • 9
  • 17
  • possible duplicate of [Converting user nickname to formal first name in Python](http://stackoverflow.com/questions/13615789/converting-user-nickname-to-formal-first-name-in-python) – Luke Jul 30 '15 at 17:42

2 Answers2

3

This thread points to a list of nickname/first name maps from the census:

http://deron.meranda.us/data/nicknames.txt

Luke
  • 6,699
  • 13
  • 50
  • 88
3

I would crawl all wikipedia pages (there is an available dump of wikipedia data) on people names, e.g., http://en.wikipedia.org/wiki/Teresa (from http://en.wikipedia.org/wiki/Category:English_given_names), and create an index that you can use to suggest people correct forms (you will rank them by the number of first name variants in your database). Unfortunately I do not know. such a database.

Skarab
  • 6,981
  • 13
  • 48
  • 86