Suppose I have the dataset in the following format:
col1 col2 col3 col4 col5 (to be predicted)
12 13 4 primary 12
1 15 2 secondary 13
5 7 8 primary 18
14 12 44 college 6
col5 needs to be predicted for some test data using col1, col2, col3 and col4
During training, col1, col2, col3 can be feeded as such in an array to the classifier but how to feed col4. I am aware that this is categorical and need to be converted to numeric type, but even after assigning some number, it will still remain as nominal type.
So if primary=1, secondary=2 and college=3, the numbers 1,2 and 3 cant be compared as per their magnitude because they are still like labels, with no numerical significance.
So how should I proceed after this step... should they be normalized ? or any further should be done ?