4

I have a set of integers from a label column in a CSV file - [1,2,4,3,5,2,..]. The number of classes is 5 ie range of 1 to 6. I want to one-hot encode them using the below code.

y = df.iloc[:,10].values
y = tf.keras.utils.to_categorical(y, num_classes = 5)
y

But this code gives me an error

IndexError: index 5 is out of bounds for axis 1 with size 5

How can I fix this?

Innat
  • 16,113
  • 6
  • 53
  • 101
emmasa
  • 177
  • 2
  • 7

1 Answers1

7

If you use tf.keras.utils.to_categorical to one-hot the label vector, the integers should start from 0 to num_classes, source. In your case, you should do as follows

import tensorflow as tf 
import numpy as np 

a = np.array([1,2,4,3,5,2,4,2,1])
y_tf = tf.keras.utils.to_categorical(a-1, num_classes = 5)
y_tf

array([[1., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0.],
       [0., 0., 0., 1., 0.],
       [0., 0., 1., 0., 0.],
       [0., 0., 0., 0., 1.],
       [0., 1., 0., 0., 0.],
       [0., 0., 0., 1., 0.],
       [0., 1., 0., 0., 0.],
       [1., 0., 0., 0., 0.]], dtype=float32)

or, you can use pd.get_dummies,

import pandas as pd 
import numpy as np 

a = np.array([1,2,4,3,5,2,4,2,1])
a_pd = pd.get_dummies(a).astype('float32').values 
a_pd

array([[1., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0.],
       [0., 0., 0., 1., 0.],
       [0., 0., 1., 0., 0.],
       [0., 0., 0., 0., 1.],
       [0., 1., 0., 0., 0.],
       [0., 0., 0., 1., 0.],
       [0., 1., 0., 0., 0.],
       [1., 0., 0., 0., 0.]], dtype=float32)
Innat
  • 16,113
  • 6
  • 53
  • 101