I'm going to try kNN classification on a dataset containing, among other features, the one called "time of day". In the context of the application, Monday 23:58 is just as close to Tuesday 00:02 as is Friday 00:04. It's the angle of the hour hand on clock's face that matters. If not that one circular feature, Euclidean distance would do.
So far I'm aware of class::knn()
and caret::knn3()
. However, I don't see a way to supply my own customized distance metric to them, or even a pre-calculated distance matrix. Do you know a way of doing this?
A possible alternative would be an extra step in data preparation, namely to replace the circular feature with two linear (an angle θ becomes a point (cosθ,sinθ) ) or to replicate data points in training set accross the 00:00 boundary causing the boundary to vanish: https://stats.stackexchange.com/questions/51908/nearest-neighbor-algorithm-for-circular-dimensions However, I'd prefer avoiding both replacing one dimension by two and creating copies of data points, if ever possible.
Another way would be to calculate the distance matrix myself and then implement kNN. This sounds very much like reinventing the wheel.
One more reason I'm looking for a way to plug in my own customized distance metric is the following. While the distance between Tuesday 15:01 o'clock to Wednesday 15:02 o'clock is 1 minute, Sunday 23:00 UTC (currency exchange market opening) is considered "far" from any other day's 23:00. Other special cases might appear, too.