2

I have a (1-dimensional) numpy array a of length L, filled with numbers from 0 to N-1.
Now, I want to construct a NxL matrix such that in each column c, the a[c]'th entry is 1 and all other entries are 0.

For example, If L=4, N=5 and

a = np.array([1,2,0,4])

then we'd want a matrix

m = np.array([[0,0,1,0],
              [1,0,0,0],
              [0,1,0,0],
              [0,0,0,0],
              [0,0,0,1]])


Now, I have the following code:

def vectorize(a, L, N):
    m = np.zeros((N, L))
    for (i,x) in enumerate(a):
        m[x][i] = 1.0

    return m

This works fine, but I'm sure there is a faster method using some numpy trick (that avoids looping over a).

Mad Physicist
  • 107,652
  • 25
  • 181
  • 264
Jonas De Schouwer
  • 755
  • 1
  • 9
  • 15
  • The second linked duplicate is a good resource on one-hot encoding, I think `np.eye(a.max()+1)[a]` is a clean approach – user3483203 Sep 07 '19 at 17:55

3 Answers3

3

When you use an array of integers as an index, you need other arrays that broadcast to the same shape to indicate the placement in the other dimensions. In your case, each element of a is a row index. The corresponding column is:

b = np.arange(L)

Now you can index directly into the matrix m:

m = np.zeros((N, L), dtype=bool)
m[a, b] = True

When you index a numpy array, you should use all the indices in a single bracket operator, rather than separate operators like m[a][b]. m[a] is a copy of the portion of m when a is an array of integers, but a view of the original data when a is a single integer, which is the only reason your example works.

Mad Physicist
  • 107,652
  • 25
  • 181
  • 264
2

You can use an np.arange(..) for the second axis:

def vectorize(a, L, N):
    m = np.zeros((N, L), int)
    m[a, np.arange(len(a))] = 1
    return m

So for the given sample input, we get:

>>> a = np.array([1,2,0,4])
>>> vectorize(a, 4, 5)
array([[0, 0, 1, 0],
       [1, 0, 0, 0],
       [0, 1, 0, 0],
       [0, 0, 0, 0],
       [0, 0, 0, 1]])
Willem Van Onsem
  • 443,496
  • 30
  • 428
  • 555
2
def vectorize(a, L, N):
    m = np.zeros((N, L))
    m[a,np.arange(L)] =1
    return m
one
  • 2,205
  • 1
  • 15
  • 37