In the Iris dataset the 'target_names' or flower labels ('setosa', 'versicolor', 'virginica') are represented by a 'target' which is either 0, 1 or 2:
iris = load_iris()
iris
'target': array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2]), 'target_names': array(['setosa', 'versicolor', 'virginica'], dtype='|S10')}
Now I have a training data set which looks something like this:
> Photography 0.1 0.1 0.1 0.1 0.1 > Social 0.2 0.2 0.2 0.2 0.2 > Libraries and Demo 0.3 0.3 0.3 0.3 0.3 > Arcade and Action 0.4 0.4 0.4 0.4 0.4 > Health and Fitness 0.5 0.5 0.5 0.5 0.5
How can I change my labels ('Photography', 'Social' etc) to be represented by target values, that is 0,1,2 etc, like we see in the Iris dataset?
There are 30 unique labels in total across 20,000 rows and 14,000 columns.