Project a multi-class array into a binary matrix

Question

I have a simple numpy array (e.g. [1,4,2,3,1]) and want to project it into a binary matrix, where each value in the array maps to an indicator in that column of the matrix.

For example, this array would map to a matrix like:

[1], => [1,0,0,0],
[4],    [0,0,0,1],
[2],    [0,1,0,0],
[3],    [0,0,1,0],
[1]     [1,0,0,0]

I can do this with iterating and list comprehensions, but is there an elegant numpy solution?

Divakar · Accepted Answer · 2016-10-21T19:15:12.783

We can use broadacsting -

(a[:,None] == np.arange(a.max())+1).astype(int)

Sample run -

In [28]: a = np.array([1,4,2,3,1,2,1,4])

In [29]: a[:,None] == np.arange(a.max())+1 # Booelan array
Out[29]: 
array([[ True, False, False, False],
       [False, False, False,  True],
       [False,  True, False, False],
       [False, False,  True, False],
       [ True, False, False, False],
       [False,  True, False, False],
       [ True, False, False, False],
       [False, False, False,  True]], dtype=bool)

In [30]: (a[:,None] == np.arange(a.max())+1).astype(int) # Int array
Out[30]: 
array([[1, 0, 0, 0],
       [0, 0, 0, 1],
       [0, 1, 0, 0],
       [0, 0, 1, 0],
       [1, 0, 0, 0],
       [0, 1, 0, 0],
       [1, 0, 0, 0],
       [0, 0, 0, 1]])

For mapping integers that are not sequential and expecting no all False columns, we could use np.unique(a) directly for comparison against the 2D version of input array a, like so -

In [49]: a = np.array([14,12,33,71,97])

In [50]: a[:,None] == np.unique(a) # Boolean array
Out[50]: 
array([[False,  True, False, False, False],
       [ True, False, False, False, False],
       [False, False,  True, False, False],
       [False, False, False,  True, False],
       [False, False, False, False,  True]], dtype=bool)

In [51]: (a[:,None] == np.unique(a)).astype(int) # Int array
Out[51]: 
array([[0, 1, 0, 0, 0],
       [1, 0, 0, 0, 0],
       [0, 0, 1, 0, 0],
       [0, 0, 0, 1, 0],
       [0, 0, 0, 0, 1]])

I'm not worried about the 0 indexing, but this result will always return a square matrix instead of one column per unique value. e.g. `a = np.array([1,4,2,3,1,2,2,2,2,2,2,2])` — Kirk Broadhurst, Oct 21 '16 at 19:03
@KirkBroadhurst Yeah I should have used `a.max()` there to create the range array. Fixed it. — Divakar, Oct 21 '16 at 19:03
I'm using this approach, but I think I need to use `np.unique(a).size` instead, and then map those unique values to the columns. Max assumes that there are contiguous and start at 0. — Kirk Broadhurst, Oct 21 '16 at 19:05

Project a multi-class array into a binary matrix

1 Answers1