What's the best way to apply a function to the row/col dimensions of an numpy array

Question

I have a 3-dimensional numpy array. Intuitively it's 2 dimensional, where each row-col position represents an RGB color, which is stored as a vector of three numbers. (It would have been so much easier had the color been stored as a triple!) I have a function (based on the answer here) that converts an RGB triple to a color name. Is there a simple way (besides nested loops) to apply that function to the row-col elements of the array. (Applying it directly to the array itself doesn't work since numpy attempts to apply the function to each element of the RGB vector.)

Thanks.

Similar: https://stackoverflow.com/questions/35215161/most-efficient-way-to-map-function-over-numpy-array — Mateen Ulhaq, Oct 12 '18 at 04:33
So you want a 2d array of strings - the color names? You'll get the most help if you show that function, and demonstrate how you'd use it on a small 3d array (loops are ok for this). — hpaulj, Oct 12 '18 at 05:21
What do you mean by `stored as a triple`? A triple of what. If your array is (n,m,3) shaped, then `arr[i,j,:]` is the 'triple' for one point, isn't it? — hpaulj, Oct 12 '18 at 05:24

score 1 · Answer 1 · answered Oct 12 '18 at 04:33

IIUC, you can just use np.dstack and reshape, or np.dstack and concatenate

np.dstack(arr).reshape(-1,3)
# equivalent:
np.concatenate(np.dstack(arr))

For example:

arr = np.random.randint(0,256,(3,5,5))
>>> arr
array([[[150,  38,  34,  41,  24],
        [ 76, 135,  93, 149, 142],
        [150, 123, 198,  11,  34],
        [ 24, 179, 132, 175, 218],
        [ 46, 233, 138, 215,  97]],

       [[194, 153,  29, 200, 133],
        [247, 101,  18,  70, 112],
        [164, 225, 141, 196, 131],
        [ 15,  86,  22, 234, 166],
        [163,  97,  94, 205,  56]],

       [[117,  56,  28,   1, 104],
        [138, 138, 148, 241,  44],
        [ 73,  57, 179, 142, 140],
        [ 55, 160, 240, 189,  13],
        [244,  36,  56, 241,  33]]])

>>> np.dstack(arr).reshape(-1,3)
array([[150, 194, 117],
       [ 38, 153,  56],
       [ 34,  29,  28],
       [ 41, 200,   1],
       [ 24, 133, 104],
       [ 76, 247, 138],
       [135, 101, 138],
       [ 93,  18, 148],
       [149,  70, 241],
       [142, 112,  44],
       [150, 164,  73],
       [123, 225,  57],
       [198, 141, 179],
       [ 11, 196, 142],
       [ 34, 131, 140],
       [ 24,  15,  55],
       [179,  86, 160],
       [132,  22, 240],
       [175, 234, 189],
       [218, 166,  13],
       [ 46, 163, 244],
       [233,  97,  36],
       [138,  94,  56],
       [215, 205, 241],
       [ 97,  56,  33]])

Using the function provided in the answer you linked, you can get the closest colors of that image:

>>> [get_colour_name(i)[1] for i in np.dstack(arr).reshape(-1,3)]
['darkseagreen', 'forestgreen', 'black', 'limegreen', 'seagreen', 'mediumaquamarine', 'grey', 'indigo', 'blueviolet', 'sienna', 'yellowgreen', 'yellowgreen', 'rosybrown', 'lightseagreen', 'darkcyan', 'midnightblue', 'palevioletred', 'blueviolet', 'powderblue', 'goldenrod', 'dodgerblue', 'chocolate', 'sienna', 'gainsboro', 'saddlebrown']

Thanks. Sounds promising. I want the keep the original shape -- and essentially to replace the color vector with the color name. Is all this reshaping going to be any faster than a simple nested loop? — RussAbbott, Oct 12 '18 at 05:21
`np.array([get_colour_name(i)[1] for i in np.dstack(arr).reshape(-1,3)]).reshape(arr.shape[1:])` would get you the colour names in the proper shape. And almost certainly be faster than nested loops — sacuL, Oct 12 '18 at 05:27

Paul Panzer · Answer 2 · 2018-10-12T06:47:49.910

If your function is not designed to accept vector arguments then there is no magic, apart from the kind that does use loops and simply hides them or maybe some jit shenanigans but I'm no expert on the latter.

Re the magic that secretly applies loops, that would be np.vectorize. To make it pass 1D subspaces to your function you can use the signature keyword

pseudo_vect_func = np.vectorize(your_func, ('O',), signature='(m)->()')

I've also added an otypes parameter, because without it vectorize seems to blindly go for U1, i.e. truncate after the first letter

If you want truly vectorized operation, here is a from-scratch method.

If you have a list or dictionary with (color name, (r, g, b)) values, and are ok with minimal distance matching, then you can utilize KDTrees for efficient lookup:

import numpy as np
from scipy.spatial import cKDTree as KDTree

# set up lookup

# borrow a list of named colors from matplotlib
from matplotlib import colors
named_colors = {k: tuple(int(v[i:i+2], 16) for i in range(1, 7, 2))
                for k, v in colors.cnames.items()}

no_match = named_colors['purple']

# make arrays containing the RGB values ...
color_tuples = list(named_colors.values())
color_tuples.append(no_match)
color_tuples = np.array(color_tuples)
# ... and another array with the names in same order
color_names = list(named_colors)
color_names.append('no match')
color_names = np.array(color_names)
# build tree
tree = KDTree(color_tuples[:-1])

def img2colornames(img, tolerance):
    # find closest color in tree for each pixel in picture
    dist, idx = tree.query(img, distance_upper_bound=tolerance)
    # look up their names
    return color_names[idx]

# an example
result = img2colornames(face(), 40)
# show a small patch
import Image
Image.fromarray(face()[410:510, 325:425]).show()
# same as names, downsampled
print(result[415:510:10, 330:425:10])

Output:

[['darkgrey' 'silver' 'dimgray' 'darkgrey' 'black' 'darkslategrey'
  'silver' 'silver' 'dimgray' 'darkgrey']
 ['darkslategrey' 'gray' 'darkgrey' 'gray' 'darkslategrey' 'gray'
  'darkgrey' 'lightsteelblue' 'darkslategrey' 'darkslategrey']
 ['darkolivegreen' 'no match' 'dimgray' 'dimgray' 'darkslategrey' 'gray'
  'slategray' 'lightslategrey' 'dimgray' 'darkslategrey']
 ['dimgray' 'dimgray' 'gray' 'dimgray' 'dimgray' 'darkslategrey'
  'dimgray' 'dimgray' 'black' 'darkseagreen']
 ['no match' 'no match' 'darkolivegreen' 'dimgray' 'dimgray' 'no match'
  'darkkhaki' 'darkkhaki' 'no match' 'dimgray']
 ['darkkhaki' 'darkkhaki' 'darkkhaki' 'tan' 'tan' 'no match'
  'darkslategrey' 'no match' 'darkslategrey' 'dimgray']
 ['no match' 'no match' 'no match' 'no match' 'no match' 'no match'
  'no match' 'no match' 'no match' 'dimgray']
 ['no match' 'black' 'no match' 'no match' 'no match' 'no match'
  'no match' 'no match' 'no match' 'darkslategrey']
 ['darkkhaki' 'no match' 'olivedrab' 'darkolivegreen' 'darkolivegreen'
  'darkolivegreen' 'darkolivegreen' 'darkolivegreen' 'darkolivegreen'
  'darkolivegreen']
 ['darkseagreen' 'no match' 'no match' 'no match' 'no match' 'no match'
  'no match' 'no match' 'no match' 'no match']]

Thanks. I'm not so concerned about getting the best color name. I'm concerned about starting with an image whose color elements are 3-element lists rather than 3-tuples. As I understand your code, you expect the input to be a 2D array of color 3-tuples. The key function call is `img2colornames(face(), 40)`, which (via np magic) applies `img2colornames` to every element of the array. My array is a 2D array of lists, each with three elements. So np magic will attempt to apply `img2colornames` to each element of those 3-element lists. (Or am I misunderstanding?) — RussAbbott, Oct 12 '18 at 15:42
What I want is a version of np-magic that lets me tell it how far down in the array to go before applying the function. E.g., I would like to be able to write something like `np.apply(img2colornames, , 2)`, which will apply `img2colornames` to every `` element at the row-col level, even if those elements are themselves arrays. — RussAbbott, Oct 12 '18 at 15:48
@RussAbbott The `signature` keyword kind of does that---only from the other end, it let's you choose how much of the depth to keep. In the example I've given, it specifies that your function expects 1D arguments and returns 0D results. So `vectorize` will only loop through the first two dimensions and leave the last one intact. — Paul Panzer, Oct 12 '18 at 15:59
Thanks. That looks like what I want. Now I'm confused about how to write a valid signature. I tried `np.vectorize(lambda lst: tuple(lst), signature='(m,n,k) -> (m,n)')` hoping that this would convert the inner lists to tuples. But I get a diagnostic saying `not a valid gufunc signature: (m,n,k) -> (m,n)`. How should this be written? Thanks. — RussAbbott, Oct 12 '18 at 20:50
P.S. I tried letting the signature refer to the function rather than the entire array. `np.vectorize(lambda lst: tuple(lst), signature='(k) -> ()' but I got the same error message: not a valid gufunc signature: (k) -> () — RussAbbott, Oct 13 '18 at 00:40

score 0 · Answer 3 · answered Oct 12 '18 at 04:37

0

You could use map and try e.g.:

list(map(your_RGB2Name_function, 2D_np_array))

Suppose you have a function, which works on a list of numbers

def dummy_fct(numlist):
    return '-'.join(map(str, numlist))

dummy_fct([1,2,3])
Out: '1-2-3'

which obviously works not as intended when applied to a list of many of those number lists

dummy_fct([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
Out: '[1, 2, 3]-[4, 5, 6]-[7, 8, 9]'

then you can use map, which iterates through an iterable (the outer list here, or in your case, the second dimension of your numpy array) and applies the function on each sublist:

list(map(dummy_fct, [[1, 2, 3], [4, 5, 6], [7, 8, 9]]))
Out: ['1-2-3', '4-5-6', '7-8-9']

answered Oct 12 '18 at 04:37

SpghttCd

10,510
2
20
25

Thanks. That's essentially the same as a list comprehension or a nested loop. I was hoping for some numpy magic that would skim off the top two dimensions. – RussAbbott Oct 12 '18 at 05:19
1

What do you mean by `skim off the top two dimensions`? If your function only works with the 3 values of one point at a time, it has to be called, in one way or other, once for each point. – hpaulj Oct 12 '18 at 05:37
1

top _two_ dimensions? I'd suggest you to please post either your function of choice or a dummy function which accepts exactly the same input parameters and returns exactly the same type of result value. Additionally please post a sample array of your data, on which'd like the function to be applied, so that we can use effort for the problem instead of guessing your frame conditions. – SpghttCd Oct 12 '18 at 06:34

What's the best way to apply a function to the row/col dimensions of an numpy array

3 Answers3