Say I have a dataframe that looks like this:
Feature 1 Feature 2 Feature 3 Feature 4 Target
1 1 1 1 a
0 1 0 0 a
0 1 1 1 b
And a vector that looks like this:
0, 1, 1, 1
How would I find the indices of the closest matching rows to the vector? For example, if I wanted to find the 2 closest rows, I would input the vector and the dataframe (perhaps with the target column removed), and I would get indices 1 and 3 as a return from the function, since those rows most closely resemble the vector "0, 1, 1, 1".
I have tried using the "caret" package from R, with the command:
intrain <- createDataPartition(y = data$Target, p= 0.7, list = FALSE)
training <- data[intrain,]
testing <- data[-intrain,]
trctrl <- trainControl(method = "repeatedcv", number = 10, repeats = 3)
knn_fit <- train(Target~., data = training, method = "knn", trControl = trctrl, preProcess = c("center", "scale"), tuneLength = 10)
test_pred <- predict(knn_fit, newdata = testing)
print(test_pred)
However, this doesn't return the index of the matching rows. It simply returns the predictions for the target that has features most closely matching the testing dataset.
I would like to find a model/command/function that can perform similarly to the KDtrees model from sklearn in python, but in R instead (KDtrees can return a list of the n closest indices). In addition, although not required, I would like said model to work with categorical values for features (such as TRUE/FALSE) so that I don't have to create dummy variables like I've done here with my 1's and 0's.