45

I'm involved with a SQL / .NET project that will be searching through a list of names. I'm looking for a way to return some results on similar first names of people. If searching for "Tom" the results would include Thom, Thomas, etc. It is not important whether this be a file or a web service. Example Design:

Table "Names" has Name and NameID
Table "Nicknames" has Nickname, NicknameID and NameID

Example output:

You searched for "John Smith"
You show results Jon Smith, Jonathan Smith, Johnny Smith, ...

Are there any databases out there (public or paid) suited to this type of task to populate a relationship between nicknames and names?

Tom Willwerth
  • 897
  • 1
  • 9
  • 18
  • 4
    Why the close votes? The requested database is an important resource for this programming project. – Larry Lustig Mar 04 '10 at 18:35
  • Questions asking us to recommend or find a book, tool, software library, tutorial or other off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it. – C8H10N4O2 Apr 27 '17 at 12:10

9 Answers9

44

I'm adding another source for anyone who comes across this question via Google. This project provides a very good lookup for this purpose.

https://github.com/carltonnorthern/nickname-and-diminutive-names-lookup

It's somewhat simpler and less complete than pdNickName but on the other hand it's free and easy to use.

Donny V.
  • 22,248
  • 13
  • 65
  • 79
Joe Harris
  • 13,671
  • 4
  • 47
  • 54
  • 2
    Thank you. Came across this question on Google 5 years later, just as you had planned for. :) – cowsay Oct 12 '16 at 13:45
  • 2
    Some of these entries are pretty questionable. For example, AARON = ERIN and BILLY = FRED – C8H10N4O2 Apr 27 '17 at 16:34
  • I used this source recently and can attest to its usefulness. Based on git commit history, the names CSV file gets updated somewhat regularly (and of course you can't beat the price). – Bill Jul 14 '17 at 22:02
13

A google search on "Database of Nicknames" turned up pdNickName (for pay).

In addition, I think you only need a single table for this job, not two, with NameID, Name, and MasterNameID. All the nicknames go into the Name column. One name is considered the "canonical" one. All the nickname records use the MasterNameID column to point back to that record, with the canonical name pointing to itself.

Your two table schema contains no additional information and, depending on how you fill in the nickname table, you might need extra code to handle the canonical cases.

Larry Lustig
  • 49,320
  • 14
  • 110
  • 160
7

I just found this site.

It looks like you could script it pretty easily.

http://www.behindthename.com/php/extra.php?terms=steve&extra=r&gender=m

I just wish I could auto narrow this to english..

rh0dium
  • 6,811
  • 4
  • 46
  • 79
  • Interesting, and they offer their database for [commercial licensing](http://www.behindthename.com/licensing.php), or via a [free (rate-limited) API](http://www.behindthename.com/api/). The [name detail](http://www.behindthename.com/name/john) pages clearly distinguish variants, diminutives, alternate genders, and other languages; I don't know whether the API provides the same level of detail. They seem to have better international coverage than pdNickname, though the variants seem most comprehensive for European names. – John Mellor Aug 03 '12 at 16:19
  • @JohnMellor documentation for API at your link states that the function to list synonyms for a name is "not currently available" – C8H10N4O2 Apr 27 '17 at 16:33
6

Another commercial name matching database is: http://www.basistech.com/name-indexer/

It looks quite professional (though potentially expensive).

They claim to support the following languages:
Arabic, Chinese (Simplified), Chinese (Traditional), Persian (Farsi / Dari), English, Japanese, Korean, Pashto, Russian, Urdu

John Mellor
  • 12,572
  • 4
  • 46
  • 35
4

Here is a github repo with csv of related names, and you can contribute back:

The first few lines show the format:

aaron,ron
abel,abe
abednego,bedney
abijah,ab,bige
abigail,ab,abbie,abby,gail
abner,ab,abbie,abby
abraham,abe,abram,bram
absalom,ab,abbie,app
Stan James
  • 2,535
  • 1
  • 28
  • 35
2

There is a database out there called pdNicknames (found at http://www.peacockdata2.com/products/pdnickname/). It contains everything you need, at a cost of $500.

Christopher Richa
  • 1,278
  • 1
  • 9
  • 20
  • How would you go about getting all the possible patterns? Take the Robert to Bob example, I can't use "like %ob% " because that will match too many. – Tom Willwerth Mar 04 '10 at 18:05
  • In that case you would need a separate table, holding an ID for each nicknames to link the real names and nicknames together. – Christopher Richa Mar 04 '10 at 18:08
  • yes, that is my question, is there a public source of data that I could use to populate the relation between name and nickname. – Tom Willwerth Mar 04 '10 at 18:12
  • 1
    Well I have found this database: http://www.peacockdata2.com/products/pdnickname/ It is not free ($500) and it has an Excel sheet in the sample download that shows you a sample of the database contents. – Christopher Richa Mar 04 '10 at 18:19
  • This link looks promising, you should make this a new answer – Tom Willwerth Mar 04 '10 at 18:33
2

Similar format as Stan James's csv, but folded two ways for lookups: Name to nickname: https://github.com/MrCsabaToth/SOEMPI/blob/master/openempi/conf/name_to_nick.csv Nickname to name: https://github.com/MrCsabaToth/SOEMPI/blob/master/openempi/conf/nick_to_name.csv

Csaba Toth
  • 10,021
  • 5
  • 75
  • 121
0

To select similar sounding name use: (see MSDN)

SELECT SOUNDEX ('Tom')
Dustin Laine
  • 37,935
  • 10
  • 86
  • 125
  • 5
    Soundex isn't really meant for first names. And beyond that, (SOUNDEX("Robert") = 'R163') != (SOUNDEX("Bob") = 'B100'), etc. – Doug McClean Mar 04 '10 at 17:58
  • 4
    Doug's point is critical here. The soundex works for Thom to Tom but not Robert to Bob. – Tom Willwerth Mar 04 '10 at 18:00
  • 1
    Or Margaret to Peggy. A lookup is necessary. – bmb Mar 04 '10 at 18:11
  • Well Robert to Bob is a good catch, but Margaret to Peggy. Come on, look at his question he asked for "similar" how is that similar. And a down vote for it, I don't think that is justified as my answer would work for his question. – Dustin Laine Mar 04 '10 at 18:18
  • 1
    @durilai I tried to clarify as soon as possible for you and your answer made me think so I'm not the one down voting you. By similar I did not intend "similar sounding" I mean "the same" or "related" – Tom Willwerth Mar 04 '10 at 18:22
  • How about Edward/Ted/Theo? How about Henry/Hank? Richard/Dick? The point is that there are a lot of common nicknames that don't work by "sound", and the OP knows that so he asked for a database. If I knew of one, I would suggest one because we looked for the same thing last year. – bmb Mar 04 '10 at 18:25
  • @Tom Willwerth, no worries. Just thought it fit your need, before update. – Dustin Laine Mar 04 '10 at 18:40
0

This is a good choice: https://github.com/onyxrev/common_nickname_csv

id, name, nickname
1, Aaron, Erin
2, Aaron, Ron
3, Aaron, Ronnie
4, Abel, Ab
5, Abel, Abe
6, Abel, Eb
7, Abel, Ebbie
8, Abiel, Ab
9, Abigail, Abby
10, Abigail, Gail
Rick
  • 1