5

I would like to map a list into numbers according to the values.

For example:

['aa', 'b', 'b', 'c', 'aa', 'b', 'a'] -> [0, 1, 1, 2, 0, 1, 3]

I'm trying to achieve this by using numpy and a mapping dict.

def number(lst):
    x = np.array(lst)
    unique_names = list(np.unique(x))
    mapping = dict(zip(unique_names, range(len(unique_names)))) # Translating dict
    map_func = np.vectorize(lambda name: d[name])
    return map_func(x)

Is there a more elegant / faster way to do this?

Update: Bonus question -- do it with the order maintained.

blancmange
  • 115
  • 6
  • 1
    Possible duplicate of [Convert alphabet letters to number in Python](http://stackoverflow.com/questions/4528982/convert-alphabet-letters-to-number-in-python) – Chankey Pathak Apr 14 '17 at 09:53
  • They are not necessarily letters... I mean the general case – blancmange Apr 14 '17 at 10:25
  • 1
    But you also seem to wanting to keep the order looking at the expected sample output that's not achieved with the func `number`, right? – Divakar Apr 14 '17 at 12:37
  • @Divakar Well at first i didn't worry about that too much. Now i find it worth consideration – blancmange Apr 14 '17 at 15:40

5 Answers5

3

You can use the return_inverse keyword:

x = np.array(['aa', 'b', 'b', 'c', 'aa', 'b', 'a'])
uniq, map_ = np.unique(x, return_inverse=True)
map_
# array([1, 2, 2, 3, 1, 2, 0])

Edit: Order preserving version:

x = np.array(['aa', 'b', 'b', 'c', 'aa', 'b', 'a'])
uniq, idx, map_ = np.unique(x, return_index=True, return_inverse=True)
mxi = idx.max()+1
mask = np.zeros((mxi,), bool)
mask[idx] = True
oidx = np.where(mask)[0]
iidx = np.empty_like(oidx)
iidx[map_[oidx]] = np.arange(oidx.size)
iidx[map_]
# array([0, 1, 1, 2, 0, 1, 3])
Paul Panzer
  • 51,835
  • 3
  • 54
  • 99
  • I do the same thing and `map_` prints as `array([1, 2, 2, 3, 1, 2, 0])` – kmario23 Apr 14 '17 at 10:51
  • any idea how to apply this row-wise for a 2d matrix? – Josh.F Feb 21 '19 at 20:35
  • @Josh.F Do you mean looking for unique rows? Then you can use the `axis` keyword of `np.unique`. Or do you mean process each row separately? In that case--if you want to use this solution--you'd have to loop over rows. Or you could look at one of the other answers. Maybe @Divakar's could be fully vectorized in that case? – Paul Panzer Feb 21 '19 at 22:03
  • The second one, where each row has `np.unique(..., return_inverse=True)` applied to it (or something to the same effect), I'll take a look at his solution, thanks :) – Josh.F Feb 21 '19 at 23:26
2

Here's a vectorized NumPy based solution -

def argsort_unique(idx):
    # Original idea : http://stackoverflow.com/a/41242285/3293881 by @Andras
    n = idx.size
    sidx = np.empty(n,dtype=int)
    sidx[idx] = np.arange(n)
    return sidx

def map_uniquetags_keep_order(a):
    arr = np.asarray(a)

    sidx = np.argsort(arr)
    s_arr = arr[sidx]

    m = np.concatenate(( [True], s_arr[1:] != s_arr[:-1] ))
    unq = s_arr[m]
    tags = np.searchsorted(unq, arr)
    rev_idx = argsort_unique(sidx[np.searchsorted(s_arr, unq)].argsort())
    return rev_idx[tags]

Sample run -

In [169]: a = ['aa', 'b', 'b', 'c', 'aa', 'b', 'a'] # String input

In [170]: map_uniquetags_keep_order(a)
Out[170]: array([0, 1, 1, 2, 0, 1, 3])

In [175]: a = [4, 7, 7, 5, 4, 7, 2]                 # Numeric input

In [176]: map_uniquetags_keep_order(a)
Out[176]: array([0, 1, 1, 2, 0, 1, 3])
Divakar
  • 218,885
  • 19
  • 262
  • 358
1

Use sets to remove duplicates:

myList = ['a', 'b', 'b', 'c', 'a', 'b']
mySet = set(myList)

Then build your dictionary using comprehension:

mappingDict = {letter:number for number,letter in enumerate(mySet)}
ma3oun
  • 3,681
  • 1
  • 21
  • 33
0

I did it using the ASCII values because it is easy and short.

def number(list):   
    return map(lambda x: ord(x)-97,list)  
l=['a', 'b', 'b', 'c', 'a', 'b']  
print number(l)

Output:

[0, 1, 1, 2, 0, 1]

Ujjwal Aryal
  • 111
  • 1
  • 5
0

If the order is not a concern:

[sorted(set(x)).index(item) for item in x]

# returns:
[1, 2, 2, 3, 1, 2, 0]
James
  • 32,991
  • 4
  • 47
  • 70