Is there a Numpy equivalent to string `translate`?

Question

If I have a string

>>> s = "abcdef"

I can replace more than one character with

>>> s.translate(str.maketrans("abc", "xyz"))
xyzdef

Is there a function that replicates this behavior in Numpy? i.e. some func such that

>>> np.func(arr, mapping)
mapped_array

This functionality can be implemented as

def func(arr, mapping):
    new_arr = arr.copy()
    for k, v in mapping.items():
        new_arr[new_arr == k] = v
    return new_arr

But this will be very slow for large arrays and/or large mappings.

I'd say it's not the same the difference being "every element" in the linked question as opposed to only those elements that correspond to a key in the mapping as is implicit in this question. Any objections? — Paul Panzer, Dec 18 '18 at 19:25
I think the proposed duplicate, https://stackoverflow.com/questions/16992713/translate-every-element-in-numpy-array-according-to-key is not good. Despite the name, it's really about using a dictionary to replace elements of an array; IOW using a dictionary as an array mapping. I'm going to reopen this. — hpaulj, Dec 18 '18 at 19:32

score 2 · Accepted Answer · answered Dec 18 '18 at 19:39

This translate is a string operation. np.char has a bunch of functions that apply such methods to all elements of a string dtype array:

In [7]: s = "abcdef"
In [8]: arr = np.array([[s,s,s],[s,s,s]])
In [9]: arr
Out[9]: 
array([['abcdef', 'abcdef', 'abcdef'],
       ['abcdef', 'abcdef', 'abcdef']], dtype='<U6')
In [10]: np.char.translate(arr, str.maketrans("abc", "xyz"))
Out[10]: 
array([['xyzdef', 'xyzdef', 'xyzdef'],
       ['xyzdef', 'xyzdef', 'xyzdef']], dtype='<U6')

However, because it calls string methods, it is not particularly fast. Past tests have shown the functions to be about the same speed as explicit loops.

If there were a limited number of such replacements, you could use one of the mapping methods in the proposed duplicate. But if you want to full power of str.translate, this, or some iteration, is the best you can do. numpy does not implement string operations in compiled code.

frompyfunc is a good way of applying a function to all elements of an array. It tends to be modestly faster than more explicit loops:

In [11]: np.frompyfunc(lambda s: s.translate(str.maketrans("abc", "xyz")),1,1)(arr)
Out[11]: 
array([['xyzdef', 'xyzdef', 'xyzdef'],
       ['xyzdef', 'xyzdef', 'xyzdef']], dtype=object)
In [12]: _.astype('U6')
Out[12]: 
array([['xyzdef', 'xyzdef', 'xyzdef'],
       ['xyzdef', 'xyzdef', 'xyzdef']], dtype='<U6')

I recently had the need to demonstrate these general trends [in this gist](https://gist.github.com/juanarrivillaga/b6509486c9333db11b0eac78e7bc297c). I compared working with numpy arrays vs python `list` objects for performing string-like operations using `fromfunc`, not exactly equivalent because we are *starting* with `numpy.ndarray` of numeric types, I suspect if they were `object` type arrays it would be even worse. — juanpa.arrivillaga, Dec 18 '18 at 20:14

score 1 · Answer 2 · edited Jun 04 '20 at 18:43

There is no dictionary in numpy, so no direct equivalent mechanism.

if "keys" are in a short range (like Unicode characters), you can build a lookup table to replace some keys, using indexing techniques for fast replacement :

data = np.random.randint(0, 256, 10)
# [179 111 211 147 204  11  20  38  87 230]

lookup_table = np.arange(256) 
lookup_table[[11, 20]] = [0, 1] # mapping

lookup_table[data]
# [179 111 211 147 204   0   1  38  87 230]

Is there a Numpy equivalent to string `translate`?

2 Answers2