5

Simply put, what I'm trying to do is similar to this question: Convert RGB image to index image, but instead of 1-channel index image, I want to get n-channel image where img[h, w] is a one-hot encoded vector. For example, if the input image is [[[0, 0, 0], [255, 255, 255]], and index 0 is assigned to black and 1 is assigned to white, then the desired output is [[[1, 0], [0, 1]]].

Like the previous person asked the question, I have implemented this naively, but the code runs quite slowly, and I believe a proper solution using numpy would be significantly faster.

Also, as suggested in the previous post, I can preprocess each image into grayscale and one-hot encode the image, but I want a more general solution.

Example

Say I want to assign white to 0, red to 1, blue to 2, and yellow to 3:

(255, 255, 255): 0
(255, 0, 0): 1
(0, 0, 255): 2
(255, 255, 0): 3

, and I have an image which consists of those four colors, where image is a 3D array containing R, G, B values for each pixel:

[
    [[255, 255, 255], [255, 255, 255], [255,   0,   0], [255,   0,   0]],
    [[  0,   0, 255], [255, 255, 255], [255,   0,   0], [255,   0,   0]],
    [[  0,   0, 255], [  0,   0, 255], [255, 255, 255], [255, 255, 255]],
    [[255, 255, 255], [255, 255, 255], [255, 255,   0], [255, 255,   0]]
]

, and this is what I want to get where each pixel is changed to one-hot encoded values of index. (Since changing a 2d array of index values to 3d array of one-hot encoded values is easy, getting a 2d array of index values is fine too.)

[
    [[1, 0, 0, 0], [1, 0, 0, 0], [0, 1, 0, 0], [0, 1, 0, 0]],
    [[0, 0, 1, 0], [1, 0, 0, 0], [0, 1, 0, 0], [0, 1, 0, 0]],
    [[0, 0, 1, 0], [0, 0, 1, 0], [1, 0, 0, 0], [1, 0, 0, 0]],
    [[1, 0, 0, 0], [1, 0, 0, 0], [0, 0, 0, 1], [0, 0, 0, 1]]
]

In this example I used colors where RGB components are either 255 or 0, but I don't want to solutions rely on that fact.

JiminP
  • 2,136
  • 19
  • 26
  • 1
    You can just do this: https://stackoverflow.com/questions/16992713/translate-every-element-in-numpy-array-according-to-key – Blender May 10 '17 at 05:54
  • Thanks, but I want to change a vector to an index (or its one-hot encoding), but I think I can't do that with `np.vectorize`. – JiminP May 10 '17 at 06:01
  • So, the input image consists of only black and white pixels? – Divakar May 10 '17 at 07:33
  • No, currently the input image has three colors (white, red, blue), but I _don't_ want to rely on color values (i.e. using something like arr[img[:, :, 0] == 0]). – JiminP May 10 '17 at 07:46
  • What's `arr` and what's `img`? Could you add a more representative sample (a bit bigger and having more colors) and show us the expected output? – Divakar May 10 '17 at 08:05
  • I added example input and output. In the previous example I wrote down a form which I don't want: relying on that one of RGB value is unique to a color. `arr` is an array containing the result and `img` is an array representing the image. For example, `arr = np.zeros(img.shape[:2]); arr[img[:, :, 0] == 0] = 2; ...` and one-hot encoding `arr` might work but I don't want it because it's hard to generalize. – JiminP May 10 '17 at 08:27
  • Did the posted solution work for you? – Divakar May 11 '17 at 21:36
  • I stated many times that I don't want hard-to-generalize answers which relies on specific RGB components' values ("... but I don't want it because it's **hard to generalize**" in the previous comment, and specifically "In this example I used colors where RGB components are either 255 or 0, **but I don't want to solutions rely on that fact**" in the example), but your answer just did this... :( – JiminP May 12 '17 at 11:13
  • @JiminP Imagine you are trying to solve this problem, which you are, so that's not hard to imagine I am guessing. Now imagine you are creating a function solution. So, that function that needs inputs. One of those inputs would be the array of 0s and 255s. Rest of the inputs would be how you define those color codes, i.e. `(255, 255, 255)` would be replaced by `0` and so on, which you have stated in the question. So, the question is how would you feed those color codes into that function? Would that be another array or a dictionary? – Divakar May 16 '17 at 12:01

3 Answers3

1

My solution looks like this and should work for arbitrary colors:

color_dict = {0: (0,   255, 255),
              1: (255, 255,   0),
              ....}


def rgb_to_onehot(rgb_arr, color_dict):
    num_classes = len(color_dict)
    shape = rgb_arr.shape[:2]+(num_classes,)
    arr = np.zeros( shape, dtype=np.int8 )
    for i, cls in enumerate(color_dict):
        arr[:,:,i] = np.all(rgb_arr.reshape( (-1,3) ) == color_dict[i], axis=1).reshape(shape[:2])
    return arr


def onehot_to_rgb(onehot, color_dict):
    single_layer = np.argmax(onehot, axis=-1)
    output = np.zeros( onehot.shape[:2]+(3,) )
    for k in color_dict.keys():
        output[single_layer==k] = color_dict[k]
    return np.uint8(output)

I haven't tested it for speed yet, but at least, it works :)

MonsterMax
  • 35
  • 7
0

We could generate the decimal equivalents of each pixel color. With each channel having 0 or 255 as the value, there would be total 8 possibilities, but it seems we are only interested in four of those colors.

Then, we would have two ways to solve it :

  • One would involve making unique indices from those decimal equivalents starting from 0 till the final color, all in sequence and finally initializing an output array and assigning into it.

  • Other way would be to use broadcasted comparisons of those decimal equivalents against the colors.

These two methods are listed next -

def indexing_based(a):
    b = (a == 255).dot([4,2,1])  # Decimal equivalents
    colors = np.array([7,4,1,6]) # Define colors decimal equivalents here
    idx = np.empty(colors.max()+1,dtype=int)
    idx[colors] = np.arange(len(colors))
    m,n,r = a.shape
    out = np.zeros((m,n,len(colors)), dtype=int)
    out[np.arange(m)[:,None], np.arange(n), idx[b]] = 1
    return out

def broadcasting_based(a):
    b = (a == 255).dot([4,2,1])  # Decimal equivalents
    colors = np.array([7,4,1,6]) # Define colors decimal equivalents here
    return (b[...,None] == colors).astype(int)

Sample run -

>>> a = np.array([
...     [[255, 255, 255], [255, 255, 255], [255,   0,   0], [255,   0,   0]],
...     [[  0,   0, 255], [255, 255, 255], [255,   0,   0], [255,   0,   0]],
...     [[  0,   0, 255], [  0,   0, 255], [255, 255, 255], [255, 255, 255]],
...     [[255, 255, 255], [255, 255, 255], [255, 255,   0], [255, 255,   0]],
...     [[255, 255, 255], [255,   0,   0], [255, 255,   0], [255,  0 ,   0]]])
>>> indexing_based(a)
array([[[1, 0, 0, 0],
        [1, 0, 0, 0],
        [0, 1, 0, 0],
        [0, 1, 0, 0]],

       [[0, 0, 1, 0],
        [1, 0, 0, 0],
        [0, 1, 0, 0],
        [0, 1, 0, 0]],

       [[0, 0, 1, 0],
        [0, 0, 1, 0],
        [1, 0, 0, 0],
        [1, 0, 0, 0]],

       [[1, 0, 0, 0],
        [1, 0, 0, 0],
        [0, 0, 0, 1],
        [0, 0, 0, 1]],

       [[1, 0, 0, 0],
        [0, 1, 0, 0],
        [0, 0, 0, 1],
        [0, 1, 0, 0]]])
>>> np.allclose(broadcasting_based(a), indexing_based(a))
True
Divakar
  • 218,885
  • 19
  • 262
  • 358
0

A simple implementation involves masking the relevant pixel positions, whether it's for converting from label to color or vice-versa. I show here how to convert between dense (1-channel labels), OHE (one-hot-encoding sparse), and RGB formats. Essentially performing OHE<->RGB<->dense.


Having defined your RGB-encoded input as rgb.

First define the color label to color mapping (no need for a dict here):

>>> colors = np.array([[ 255, 255, 255],
                       [ 255,   0,   0],
                       [   0,   0, 255],
                       [ 255, 255,   0]])

RGB (h, w, 3) to dense (h, w)

dense = np.zeros(seg.shape[:2])
for label, color in enumerate(colors):
    dense[np.all(seg == color, axis=-1)] = label

RGB (h, w, 3) to OHE (h, w, #classes)

Similar to the previous conversion, RGB to one-hot-encoding requires two additional lines:

ohe = np.zeros((*seg.shape[:2], len(colors)))
for label, color in enumerate(colors):
    v = np.zeros(len(colors))
    v[label] = 1
    ohe[np.all(seg == color, axis=-1)] = v

dense (h, w) to RGB (h, w, 3)

rgb = np.zeros((*labels.shape, 3))
for label, color in enumerate(colors):
    rgb[labels == label] = color

OHE (h, w, #classes) to RGB (h, w, 3)

Converting from OHE to dense requires one line:

dense = ohe.argmax(-1)

Then you can simply follow dense->RGB.

Ivan
  • 34,531
  • 8
  • 55
  • 100