2

I have an numpy array with shape (1, m) and each entry (n) is an integer ranging 0-9.

I want to create a new matrix that has shape (m, 10) where all the entries are 0, except it is 1 for the nth column.

For example:

[2, 3, 1] -> [[0, 0, 1, 0, ...], [0, 0, 0, 1, ...], [0, 1, 0, 0, ...]]

The code I wrote for it that works is:

y_values = np.array([[2, 3, 6, 4, 7]])
y = np.zeros((10, y_values.shape[1])) 
for i in range(y_values.shape[1]):
    y[y_values[0][i]][i] = 1

Is there a way I can get rid of the for loop, and make this more efficient?

Mad Physicist
  • 107,652
  • 25
  • 181
  • 264
Yongjun Lee
  • 1,005
  • 1
  • 7
  • 6
  • @roy-cha. Don't delete your answer, it's totally fine. I just meant that OP wanted to use 10 instead of max+1. – Mad Physicist Dec 25 '19 at 03:47
  • 1
    Does this answer your question? [Convert array of indices to 1-hot encoded numpy array](https://stackoverflow.com/questions/29831489/convert-array-of-indices-to-1-hot-encoded-numpy-array) – Mykola Zotko Dec 25 '19 at 04:37

2 Answers2

2

As you would expect, there is a way, using fancy indexing. You need to supply two arrays, giving the corresponding coordinates in each direction. The column index you already have. The row index, corresponding to each column, is just np.arange(m):

result = np.zeros((m, 10), dtype=np.bool)
result[np.arange(m), y_values[0]] = True
Mad Physicist
  • 107,652
  • 25
  • 181
  • 264
2

Another solution would be (if you are certain that all 0-9 classes are going to be there),

df = pd.get_dummies([2, 3, 1, 4]).T
thushv89
  • 10,865
  • 1
  • 26
  • 39