15

I have a numpy array, which has hundreds of elements which are capital letters, in no particular order

import numpy as np
abc_array = np.array(['B', 'D', 'A', 'F', 'H', 'I', 'Z', 'J', ...])

Each element in this numpy.ndarray is a numpy.string_.

I also have a "translation dictionary", with key/value pairs such that the capital letter corresponds to a city

transdict = {'A': 'Adelaide', 'B': 'Bombay', 'C': 'Cologne',...}

There are only 26 pairs in the dictionary transdict, but there are hundreds of letters in the numpy array I must translate.

What is the most efficient way to do this?

I have considered using numpy.core.defchararray.replace(a, old, new, count=None)[source] but this returns a ValueError, as the numpy array is a different size that the dictionary keys/values.

AttributeError: 'numpy.ndarray' object has no attribute 'translate'

ShanZhengYang
  • 16,511
  • 49
  • 132
  • 234
  • Did you try out any code that works, be it *inefficient*? What's the expected output for the sample data? – Divakar Nov 04 '15 at 18:44
  • @Divakar Actually, my best guess was to use `numpy.core.defchararray.replace()`, but that doesn't work. So yes, I don't know what to do. – ShanZhengYang Nov 04 '15 at 18:46
  • For `old`, you need to use `transdict.keys()`, for `new` you need to use `transdict.values()`, Then `.replace(abc_array, old, new)` should work – Chad S. Nov 04 '15 at 18:53
  • @ChadS. I thought so too, but it doesn't. `abc_array` is a numpy array shaped (700,1), `old` and `new` are numpy arrays both shaped (26,1) – ShanZhengYang Nov 04 '15 at 18:58
  • @ChadS. I get a broadcasting error: `ValueError: shape mismatch: objects cannot be broadcast to a single shape` – ShanZhengYang Nov 04 '15 at 18:59
  • is there really a reason to have it as a numpy array instead of just a list? Why not yo try: `output = [transdict[letter] for letter in abc_array] print (output) ` – Vadim Feb 17 '16 at 18:38

2 Answers2

14

With brute-force NumPy broadcasting -

idx = np.nonzero(transdict.keys() == abc_array[:,None])[1]
out = np.asarray(transdict.values())[idx]

With np.searchsorted based searching and indexing -

sort_idx = np.argsort(transdict.keys())
idx = np.searchsorted(transdict.keys(),abc_array,sorter = sort_idx)
out = np.asarray(transdict.values())[sort_idx][idx]

Sample run -

In [1]: abc_array = np.array(['B', 'D', 'A', 'B', 'D', 'A', 'C'])
   ...: transdict = {'A': 'Adelaide', 'B': 'Bombay', 'C': 'Cologne', 'D': 'Delhi'}
   ...: 

In [2]: idx = np.nonzero(transdict.keys() == abc_array[:,None])[1]
   ...: out = np.asarray(transdict.values())[idx]
   ...: 

In [3]: out
Out[3]: 
array(['Bombay', 'Delhi', 'Adelaide', 'Bombay', 'Delhi', 'Adelaide',
       'Cologne'], 
      dtype='|S8')

In [4]: sort_idx = np.argsort(transdict.keys())
   ...: idx = np.searchsorted(transdict.keys(),abc_array,sorter = sort_idx)
   ...: out = np.asarray(transdict.values())[sort_idx][idx]
   ...: 

In [5]: out
Out[5]: 
array(['Bombay', 'Delhi', 'Adelaide', 'Bombay', 'Delhi', 'Adelaide',
       'Cologne'], 
      dtype='|S8')
Divakar
  • 218,885
  • 19
  • 262
  • 358
  • Didn't you just answer another dicitonary question with searchsorted? :) – hpaulj Nov 04 '15 at 20:44
  • 1
    @hpaulj Isn't that fun!? ;) *NumPythonic* everything :) – Divakar Nov 04 '15 at 20:44
  • Something which caught me be surprise. The first method will fail on large data structures with unhelpful error messages. As far as I can tell `==` starts returning `True`/`False` above some limit rather than broadcasting. – AnnanFay Dec 07 '17 at 00:07
  • 5
    For Python 3 both `keys()` and `values()` have to be wrapped in `list` statements. – magum Jun 03 '20 at 11:04
  • Reading your answers always blows my mind haha, cool to have you doing all this! – CodeNoob Dec 29 '21 at 17:16
11

Will this do? Sometimes, plain Python is a good, direct way to handle such things. The below builds a list of translations (easily converted back to a numpy array) and the joined output.

import numpy as np
abc_array = np.array(['B', 'D', 'A', 'F', 'H', 'I', 'Z', 'J'])

transdict = {'A': 'Adelaide',
             'B': 'Bombay',
             'C': 'Cologne',
             'D': 'Dresden',
             'E': 'Erlangen',
             'F': 'Formosa',
             'G': 'Gdansk',
             'H': 'Hague',
             'I': 'Inchon',
             'J': 'Jakarta',
             'Z': 'Zambia'
}

phoenetic = [transdict[letter] for letter in abc_array]
print ' '.join(phoenetic)

The output from this is:

Bombay Dresden Adelaide Formosa Hague Inchon Zambia Jakarta
Prune
  • 76,765
  • 14
  • 60
  • 81
  • I think you're right; a Python for loop is the way to go. – ShanZhengYang Nov 04 '15 at 19:11
  • 1
    Fine ... but you might wait around for other ideas, especially the pure numpy ones. Don't forget to "accept" your favourite once you have a good solution. – Prune Nov 04 '15 at 19:37
  • Solution based on @Prune answer for those who have elements that shouldn't be replaced, so "transdict" contains only the elements that should be replaced: ```phoenetic = [(transdict[letter] if letter in transdict else letter) for letter in abc_array]``` – CyxouD Nov 10 '22 at 14:17