The input is a sequence of str, and the output is a sequence of int by looking up a mapping: Dict[str, int].
For example, if the input is ['foo', 'bar', 'baz'], and the mapping is {'foo': 1, 'bar': 2, 'baz': 3}, then the result should be [1, 2, 3].
Below is my own implementation in Jupyter notebooks, the performance is better than pure Python version, but still not fast enough. Is there any possibility to further improve the benchmark? Thanks very much!
%%cython -c=-O3
import numpy as np
cimport cython
cimport numpy as np
np.import_array()
INT = np.int
ctypedef np.int_t INT_t
@cython.boundscheck(False)
@cython.wraparound(False)
cpdef func(list arr, dict mapping):
cdef int n = len(arr)
cdef np.ndarray[INT_t, ndim=1] ret = np.empty(n, dtype=INT)
cdef int i
for i in range(n):
ret[i] = mapping[arr[i]]
return ret