I am trying to predict values for a categorical variable using a KNN model in R.
To do this, I am using a function so that I can easily vary the dataset, % of observations, and k-value.
When I apply this function to a particular dataset though, I am getting an error.
EDIT: I am somewhat limited in terms of how reproducible I can make this question, however, I am adding the libraries so that it is clear what packages I am using.
The data I am using is structured like this:
library(dplyr)
library(class)
library(neuralnet)
library(nnet)
library(lubridate)
> head(crypto_data)
time btc_price eth_price block_size difficulty estimated_btc_sent estimated_transaction_volume_usd hash_rate
1 2017-09-02 21:54:00 1.622181 1.710355 0.9502574 -1.258379 -0.05186039 0.4346130 -0.7265456
2 2017-09-02 22:29:00 1.738889 1.970749 0.5771003 -1.258379 -0.07004424 0.4110978 -1.0477347
3 2017-09-02 23:04:00 1.705891 1.938885 0.4726202 -1.258379 -0.10641195 0.3755673 -0.9406717
4 2017-09-02 23:39:00 1.775354 2.159321 0.4144439 -1.258379 -0.14277966 0.3348643 -0.8871402
5 2017-09-03 00:14:00 2.028195 2.572964 0.2132932 -1.258379 -0.10641195 0.4305168 -1.0477347
6 2017-09-03 00:49:00 2.097871 2.504085 0.0190859 -1.258379 -0.14277966 0.3756431 -1.1547978
miners_revenue_btc miners_revenue_usd minutes_between_blocks n_blocks_mined n_blocks_total n_btc_mined n_tx nextretarget
1 1.0287278 1.699011 -0.43408783 0.37556660 -2.016092 0.37464164 0.04072815 -2.22295
2 0.6856301 1.417137 -0.11622241 0.04004961 -2.015293 0.06154488 -0.12441993 -2.22295
3 0.7955973 1.507554 -0.22217755 0.15188860 -2.008898 0.15100110 -0.05626304 -2.22295
4 0.8395842 1.543490 -0.29923583 0.20780810 -2.005700 0.19572920 -0.10762521 -2.22295
5 0.6812315 1.519311 -0.06806098 0.04004961 -2.003302 0.06154488 -0.09733929 -2.22295
6 0.5580682 1.416853 -0.03916412 -0.07178939 -2.000904 -0.07263945 -0.19824250 -2.22295
total_btc_sent total_fees_btc totalbtc trade_volume_btc trade_volume_usd targetVar
1 -0.9319080 2.703601 -2.551107 0.2518994 0.5783353 buy
2 -0.9698475 2.632490 -2.551107 0.2518994 0.5783353 buy
3 -0.9698475 2.638365 -2.551107 0.2518994 0.5783353 buy
4 -1.0077870 2.594611 -2.551107 0.2518994 0.5783353 buy
5 -1.0077870 2.628309 -2.551107 0.1465798 0.4688573 hold
6 -1.0267568 2.568152 -2.551107 0.1465798 0.4688573 hold
The function is:
knn_predFunc <- function(inData, k, trainPct) {
trainP <- trainPct * .6
valP <- trainPct * .2
testP <- trainPct * .2
#SplitData
trainObs <- sample(nrow(inData), trainP * nrow(inData), replace = FALSE)
valObs <- sample(nrow(inData), valP * nrow(inData), replace = FALSE)
testObs <- sample(nrow(inData), testP * nrow(inData), replace = FALSE)
# Create the training/va/test datasets
trainDS <- inData[trainObs,]
valDS <- inData[valObs,]
testDS <- inData[testObs,]
# Separate the labels
train_labels <- trainDS[,"targetVar"]
# KNN
knn_crypto_val_pred <- knn(trainDS, valDS, train_labels, k = k)
knn_crypto_test_pred <- knn(trainDS, testDS, train_labels, k = k)
}
When I call knn_pred_func(crypto_data, 3, 1)
I get the following error-
Error in knn(trainDS, valDS, train_labels, k = k) : NA/NaN/Inf in foreign function call (arg 6) In addition: Warning messages: 1: In knn(trainDS, valDS, train_labels, k = k) : NAs introduced by coercion 2: In knn(trainDS, valDS, train_labels, k = k) : NAs introduced by coercion
What does this mean and how can I fix it? I have tried several variations of the knn_pred_func
that all come up with the same error. Also, initially I had a separate set for train/val/test labels but I kept only the train_labels after looking at an online posting- isnt this wrong? Shouldnt I be feeding the labels to each knn
of the corresponding dataset?