My training data contains of text like
EMI3776438, U9BA7E, 20FXU84P, 4506067765, N8UZ00351
I am using the K-Neighbors classifier algorithm.
Right now, the method I am using is to convert the alphabets to a number.
For example, a
/A
would map to 10
, b
/B
would map to 11
, c
/C
would map to 12
. After the conversion, I will send this data to the K-Neighbors classifier.
So, for example, ABI37
becomes 1011I37
.
The problem with this method is that both AA
and 1010
will map to 1010
and there is no way for the algorithm to differentiate them and classify properly.
Is there a good method to convert these to only numbers (since this algo only works on numbers) so that the real value and classification can be done correctly?