I started to study machine learning some days ago and I am trying to apply knn to predict accident severity with the information: latitude, longitude, number of vehicles, number of casualties, day of week and period of day. The original dataset and analysis can be see at DataCamp workspace.
Well, after some data preparation, I split the data into train_data and test_data and apply on knn function, but I am getting an error.
My code:
library(class)
n_accidents <- nrow(accidents_ml)
train_rows <- sample(n_accidents, 0.7 * n_accidents)
train_data <- accidents_ml[train_rows,-1]
train_data_labels <- accidents_ml[train_rows,1]
test_data <- accidents_ml[-train_rows,-1]
accidents_prev_1 <- knn(train = train_data, test = test_data, cl = train_data_labels)
Error:
Error in knn(train = train_data, test = test_data, cl = train_data_labels) :
'train' and 'class' have different lengths
I make sure that dataset does not have any missing value. I try to use test_data into training spot and using only numeric variables, but I still get the error.