Why performs the NN better with OneHotEncoding?

Question

i have a question just for a general case. So i am working with the poker-hand-dataset, which has 10 possible outputs from 0-9, each number gives a poker-hand, for example royal flush.

So i read in the internet, that it is necessary to use OHE in a multiclass problem because if not there would be like a artificial order, for example if you work with cities. But in my case with the poker hands there is a order from one pair over flush and straight to royal flush, right?

Even though my nn performs better with OHE, but it works also (but bad) without. So why does it work better with the OHE? I did a Dense Network with 2 hidden layer.

LemonPy · Accepted Answer · 2020-05-03T10:19:34.623

1

Short answer - depending on the use of the feature in the classification and according to the implementation of the classifier you use, you decide if to use OHE or not. If the feature is a category, such that the rank has no meaning (for example, the suit of the card 1=clubs, 2=hearts...) then you should use OHE (for frameworks that require categorical distinction), because ranking it has no meaning. If the feature has a ranking meaning, with regards to the classification, then keep it as-is (for example, the probability of getting a certain winnig hand).

As you did not specify to what task you are using the NN nor the loss function and a lot of other things - I can only assume that when you say "...my nn performs better with OHE" you want to classify a combination to a class of poker hands and in this scenario the data just presents for the learner the classes to distinguish between them (as a category not as a rank). You can add a feature of the probability and/or strength of the hand etc. which will be a ranking feature - as for the resulted classifier, that's a whole other topic if adding it will improve or not (meaning the number of features to classification performance).

Hope I understood you correctly.

Note - this is a big question and there is a lot of hand waving, but this is the scope.

edited May 03 '20 at 10:19

answered May 03 '20 at 10:13

LemonPy

500
4
12

Thanks! But I think My data represent a order - from rank 0 to 9, which means from "highest card" to "royal flush"- isn't it like a order? Because you assume that "presents for the learner the classes to distinguish between them" ...i am using 2 hidden layer with 400 units, sigmoid and softmax activ. and binary crossentropy as loss, RMSprop as optimizer. You can see my code here: https://stackoverflow.com/questions/61559333/neural-network-why-is-my-code-not-reproducible – Eli Hektor May 04 '20 at 07:31
what are you trying to do with the net? What are your y-values (y_test, y_train)? what is the context of your output layer? – LemonPy May 04 '20 at 12:33
Like i said you can see my entire code here: https://stackoverflow.com/questions/61559333/neural-network-why-is-my-code-not-reproducible – Eli Hektor May 04 '20 at 12:40
I am trying to just get the best accuracy possible, i use the well known dataset of poker hands from UCI machine learning repository, and i want to predict the poker hands, for example if its a royal flush or a straight. the classes are represented as numbers, from 0-9 – Eli Hektor May 04 '20 at 12:41
Like I said you are using it as classes. Like the well known digit set, in which you categorize digit into classes even though they are ranked – LemonPy May 04 '20 at 12:45
So does this mean if you use the numbers 0-9 as classes it is not ranked? even though the poker hands are ranked in reality – Eli Hektor May 04 '20 at 12:48
Exactly. Because you want to classify the hand into one of those classes. You can add features that will note the strength or the probability of getting that class, but in the eyes (the output layer you just defined) of the net you sort it into classes with names, much like when you classify the digit "1" into the class "1", regardless of its rank. – LemonPy May 04 '20 at 12:52

Why performs the NN better with OneHotEncoding?

1 Answers1