-2

I'm working on an image classification problem where I got the train labels as a 1-D numpy array, like [1,2,3,2,2,2,4,4,3,1]. I used

train_y = []
for label in train_label:
    if label == 0:
        train_y.append([1,0,0,0])
    elif label == 1:
        train_y.append([0,1,0,0])
    elif label == 2:
        train_y.append([0,0,1,0])
    elif label == 3:
        train_y.append([0,0,0,1])

Also I need the len(one_hot_array) = set(train_labels), but this is not a good method. Please recommend a good method to do so.

NickD
  • 5,937
  • 1
  • 21
  • 38
Soumyajit
  • 92
  • 8
  • 4
    Possible duplicate of [Convert array of indices to 1-hot encoded numpy array](https://stackoverflow.com/questions/29831489/convert-array-of-indices-to-1-hot-encoded-numpy-array) – Boris Verkhovskiy Apr 08 '19 at 18:00

2 Answers2

1

It's always a good habit to use numpy for arrays. np.unique() determins the labels you have in train_labels. ix is an array of indices. np.nonzero() gives the indices of train_lables where train_labels == unique_tl[iy].

import numpy as np

train_labels = np.array([2,5,8,2,5,8])
unique_tl = np.unique(train_labels)

NL = len(train_labels)               # how many data , 6
nl = len(unique_tl)                  # how many labels, 3   
target = np.zeros((NL,nl),dtype=int)

for iy in range(nl):
    ix = np.nonzero(train_labels == unique_tl[iy]) 
    target[ix,iy] = 1

gives

target
array([[1, 0, 0],
       [0, 1, 0],
       [0, 0, 1],
       [1, 0, 0],
       [0, 1, 0],
       [0, 0, 1]])

I'll think about a possibility to eliminate the for-loop.

If [2,5,8] is meant as part of [0,1,2,3,4,5,6,7,8], then you can use this answer

pyano
  • 1,885
  • 10
  • 28
  • This is the exact solution I wanted ..Thank you very much for your time :-) – Soumyajit Apr 09 '19 at 08:44
  • Great :-). And since you are rather new here: If you find some useful answers here in StackOverflow then you can give an upvote (orange up-arrow on the left of the text field). If you have asked a question yourself and have got an answer, that solves your problem you might click "solved" (= check sign below the up/down arrows, which becomes green) too – pyano Apr 09 '19 at 08:54
  • Yep.. But for the upvote it says "Vote cast by those with less than 15 reputation are recorded but do not change the publicly displayed post score" :-( – Soumyajit Apr 09 '19 at 09:00
  • it is as it is … :-) – pyano Apr 09 '19 at 09:02
0

make a vector of zeros, and set only one value to 1

target = np.zeros(num_classes)
target[label] = 1
train_y.append(target)
blue_note
  • 27,712
  • 9
  • 72
  • 90
  • What if the train_labels = [2,5,8,2,5,8], means they are not from 0 to 4. where i can just use the values of labels as indices – Soumyajit Apr 08 '19 at 18:06