I'm have a data set that consists of approx 30 features all of which except one are similar and one that is a category (the result of a preprocessing step to generate clusters)
Each cluster is generally a similar set of features of similar number values but there are also often some outliers – see below.
For example: - Features labeled A,B,C… ect
Note: I have converted the NAN in the data to the number 0.
A B C D E F G H …> Cluster
78 0 0 67 48 35 0 0 1
0 67 0 66 45 35 0 0 1
0 0 0 68 44 38 0 0 1
0 0 0 66 43 36 0 0 1
78 50 67 0 0 0 0 0 2
75 55 60 0 0 0 0 0 2
77 54 61 0 0 78 0 0 2
Question: I need to be able to feed in a new feature set (single row) and predict the cluster number. What will be the best classification algorithm for this task given that there are these outliers the data and only mostly similar?