1

I have a numpy array:

a = [[0 1 2 3 4]
     [0 1 2 3 4]
     [0 1 2 3 4]]

I have a dictionary with values I want to substitute/map:

d = { 0 : ( 000, 001 ),
      1 : ( 100, 101 ),
      2 : ( 200, 201 ),
      3 : ( 300, 301 ),
      4 : ( 400, 401 )}

So that I end up with:

a = [[(000, 001) (100, 101) (200, 201) (300, 301) (400, 401)]
     [(000, 001) (100, 101) (200, 201) (300, 301) (400, 401)]
     [(000, 001) (100, 101) (200, 201) (300, 301) (400, 401)]]

According to this SO answer, one way to do a value map based on a dictionary is:

b = np.copy( a )
for k, v in d.items(): b[ a == k ] = v

This works when the key and value are of the same data type. But in my case, the key is an int while the new value is a tuple (of ints). Accordingly, I get a cannot assign 2 input values error.

Instead of b = np.copy( a ), I have tried:

b = a.astype( ( np.int, 2 ) )

However, I get the reasonable error of ValueError: could not broadcast input array from shape (3,5) into shape (3,5,2).

So, how can I go about mapping ints to tuples in a numpy array?

Community
  • 1
  • 1
Jet Blue
  • 5,109
  • 7
  • 36
  • 48
  • Do you really **need** numpy arrays for this? Before you start using record-arrays, structured-arrays or object-arrays you should at least consider using a plain python list instead. :) – MSeifert Apr 04 '17 at 20:18
  • Yes, I'm using pygame and I need to blit a numpy array. Specifically one with ( r, g, b ) values. **Speed is critical** so I want to do all the mapping using numpy. I have been doing the mapping using standard python, but it is not fast enough. – Jet Blue Apr 04 '17 at 20:37
  • 1
    Could you maybe add more context then? Is it only `0` and `1` you want to replace or also `2`, `3` and `4`? What should your final array look like? But keep in mind that actually inserting `tuple`s in a numpy array will create an object array that isn't faster (could also be slower) than a python list. – MSeifert Apr 04 '17 at 20:41
  • Updated question accordingly. The numbers in the original array correspond to ( r, g, b ) tuples. In the end I want a numpy array of rgb tuples given a numpy array of int codes. – Jet Blue Apr 04 '17 at 20:46
  • How about a (3,3,2) shaped array? `np.array(list(adict.values()))[a]` – hpaulj Apr 04 '17 at 20:48
  • @hpaulj, I don't understand your suggestion. Can you elaborate? – Jet Blue Apr 04 '17 at 20:52
  • @hpaulj Does that really work? I think this could be problematic because dictionaries can be unordered (except for python 3.6). :) – MSeifert Apr 04 '17 at 20:53
  • Yes, it does depend on creating an array from the dictionary with all values in order. I suggested the simplest approach, which might not apply in all cases. `@Panzer has more robust ways of creating the `out` array. – hpaulj Apr 04 '17 at 21:09

2 Answers2

1

How about this?

import numpy as np

data = np.tile(np.arange(5), (3, 1))

lookup = { 0 : ( 0, 1 ),
           1 : ( 100, 101 ),
           2 : ( 200, 201 ),
           3 : ( 300, 301 ),
           4 : ( 400, 401 )}

# get keys and values, make sure they are ordered the same
keys, values = zip(*lookup.items())

# making use of the fact that the keys are non negative ints
# create a numpy friendly lookup table
out = np.empty((max(keys) + 1,), object)
out[list(keys)] = values

# now out can be used to look up the tuples using only numpy indexing
result = out[data]
print(result)

prints:

[[(0, 1) (100, 101) (200, 201) (300, 301) (400, 401)]
 [(0, 1) (100, 101) (200, 201) (300, 301) (400, 401)]
 [(0, 1) (100, 101) (200, 201) (300, 301) (400, 401)]]

Alternatively, it may be worth considering using an integer array:

out = np.empty((max(keys) + 1, 2), int)
out[list(keys), :] = values

result = out[data, :]
print(result)

prints:

[[[  0   1]
  [100 101]
  [200 201]
  [300 301]
  [400 401]]

 [[  0   1]
  [100 101]
  [200 201]
  [300 301]
  [400 401]]

 [[  0   1]
  [100 101]
  [200 201]
  [300 301]
  [400 401]]]
Paul Panzer
  • 51,835
  • 3
  • 54
  • 99
  • I'm trying to use numpy built-ins to do the mapping. – Jet Blue Apr 04 '17 at 21:04
  • @JetBlue does the lookup table change? Because if not you can reuse `out` and I doubt it gets much faster than `result = out[data]`. – Paul Panzer Apr 04 '17 at 21:09
  • It definetly depends on the size of the array, the number of keys in the dictionary and the range of values of these keys what approach will be the fastest. :) Also the result array might not be contiguous anymore using certain kinds of indexing so subsequent operations might be much slower. – MSeifert Apr 04 '17 at 21:11
  • `@Jet Blue`, what built-ins are you talking about. Paul is using built-in indexing, `out[a]`. – hpaulj Apr 04 '17 at 21:13
  • @MSeifert It is true that a too large spread of keys may make this approach infeasible. Re contiguity since the result is newly created I can't see why it shouldn't be contiguous. – Paul Panzer Apr 04 '17 at 21:40
1

You could use a structured array (that's like using tuples but you don't loose the speed advantage):

>>> rgb_dtype = np.dtype([('r', np.int64), ('g', np.int64)])
>>> arr = np.zeros(a.shape, dtype=rgb_dtype)
>>> for k, v in d.items():
...     arr[a==k] = v
>>> arr
array([[(  0,   1), (100, 101), (200, 201), (300, 301), (400, 401)],
       [(  0,   1), (100, 101), (200, 201), (300, 301), (400, 401)],
       [(  0,   1), (100, 101), (200, 201), (300, 301), (400, 401)]], 
      dtype=[('r', '<i8'), ('g', '<i8')])

The for-loop could probably be replaced with some faster operation. However if your a contains very few different values compared to the total size this should be fast enough.

MSeifert
  • 145,886
  • 38
  • 333
  • 352
  • 1
    The application I'm using (pygame) is fixed on the array type it accepts. I don't have the flexibility to change this. – Jet Blue Apr 04 '17 at 21:12
  • 1
    I don't understand. If you look at the output this is exactly what you wanted. (except for the dtype but there is no dtype that allows tuples except `object`). There is nothing in the question about pygame. Please update your question with the **exact** limitations. – MSeifert Apr 04 '17 at 21:17