4

I am trying to mapping users from different systems based on user first and last name in Python.

One issue is that the first names are in many cases 'nicknames.' For example, for a user, his first name is 'Dave' in one system, and 'David' in another.

Is there any easy way in python to convert common nicknames like these to their formal counterparts?

Thanks!

wangke99
  • 141
  • 2
  • 4
  • 2
    This [csv file](http://code.google.com/p/nickname-and-diminutive-names-lookup/) may be of use to you. (Though, unfortunately, it does not list `Dave` as a nickname for `David`.) See also this [SO question](http://stackoverflow.com/q/2381522/190597). – unutbu Nov 28 '12 at 22:50
  • 1
    A dictionary could do it easily, however I don't know where to find a source of nicknames. – Mark Ransom Nov 28 '12 at 22:50

3 Answers3

5

Not within Python specifically, but try using this:

http://deron.meranda.us/data/nicknames.txt

If you load that data into python (csv.reader(<FileObject>, delimiter='\t')), you can then do a weighted probability-type function to return a full name for the nicknames in that list.

You could do something like this:

import collections

def weighted_choice_sub(weights):
    # Source for this function:
    #  http://eli.thegreenplace.net/2010/01/22/weighted-random-generation-in-python/
    rnd = random.random() * sum(weights)
    for i, w in enumerate(weights):
        rnd -= w
        if rnd < 0:
            return i

def load_names():
   with open(<filename>, 'r') as infile:
      outdict = collections.defaultdict(list)
      for line in infile.readlines():
          tmp = line.strip().split('\t')
          outdict[tmp[0]].append((tmp[1], float(tmp[2])))
   return outdict


def full_name(nickname):
    names = load_names()
    return names[nickname][weighted_choice_sub([x[1] for x in names[nickname]])][0]
jdotjdot
  • 16,134
  • 13
  • 66
  • 118
0

You'd have to create a database or hash mapping nicknames onto formal names. If you can find such a list online, the process of implementing the map will be trivial. The real fun will be getting a complete enough list, ensuring variations are taken care of, and making sure you don't run into problems when people's formal names ARE their nicknames. Not everyone who goes by Dave has a formal name of David for example. The person's formal name may very well be Dave.

RonaldBarzell
  • 3,822
  • 1
  • 16
  • 23
0
In [1]: first_name_dict = {'David':['Dave']}
In [2]: def get_real_first_name(name):
   ...:     for first_name in first_name_dict:
   ...:         if first_name == name:
   ...:             return name
   ...:         elif name in first_name_dict[first_name]:
   ...:             return first_name
   ...:         else:
   ...:             return name
   ...:         

In [3]: get_real_first_name('David')
Out[3]: 'David'

In [4]: get_real_first_name('Dave')
Out[4]: 'David'

I'm using Ipython. Basically you need a dictionary to do that. The first_name_dict is your first name dictionary. For example, David can be called as "Dave" or "Davy", and Lucas can be called as "Luke", then you can write the dictionary like:

first_name_dict = {'David' : ['Dave', 'Davy'], 'Lucas' : ['Luke']}

You can improve the solution by adding "case-insensitive" matching

goFrendiAsgard
  • 4,016
  • 8
  • 38
  • 64
  • For efficiency's sake you want to reverse the dictionary, with the nickname as the key and the formal name as the value. – Mark Ransom Nov 28 '12 at 22:59