1

I started to study machine learning some days ago and I am trying to apply knn to predict accident severity with the information: latitude, longitude, number of vehicles, number of casualties, day of week and period of day. The original dataset and analysis can be see at DataCamp workspace.

Well, after some data preparation, I split the data into train_data and test_data and apply on knn function, but I am getting an error.

My code:

library(class)

n_accidents <- nrow(accidents_ml)
train_rows <- sample(n_accidents, 0.7 * n_accidents)

train_data <- accidents_ml[train_rows,-1]
train_data_labels <- accidents_ml[train_rows,1]

test_data <- accidents_ml[-train_rows,-1]

accidents_prev_1 <- knn(train = train_data, test = test_data, cl = train_data_labels)

Error:

Error in knn(train = train_data, test = test_data, cl = train_data_labels) : 
  'train' and 'class' have different lengths

I make sure that dataset does not have any missing value. I try to use test_data into training spot and using only numeric variables, but I still get the error.

camille
  • 16,432
  • 18
  • 38
  • 60
GregOliveira
  • 151
  • 10
  • 3
    One of the reasons we suggest you use something like `dput` to include a sample of data (that's listed at the top of the [tag:r] tag and [here](https://stackoverflow.com/q/5963269/5325862)) is that even if the external link to the tutorial doesn't change over time, it in turn links to a download page with about 40 links to datasets. I can figure out which one to download by reading the tutorial, but you should make it as easy as possible to get a representative sample of the data – camille Dec 13 '21 at 22:53
  • Change your `train` and `test` into matrix with `as.matrix` instead of a data frame. It will most likely solve your problem. – Anoushiravan R Dec 13 '21 at 23:57
  • Hi, Camille! Thanks for the tips. I will try to improve my question later. – GregOliveira Dec 14 '21 at 09:16
  • Anoushiravan R, I tried, but got another error: ``` Error in knn(train = train_data, test = test_data, cl = train_data_labels) : NA/NaN/Inf in foreign function call (arg 6) ``` I am trying to use some factor and maybe this is the problem. I will try to remove these factors and transform them into integers. Thanks for your help! – GregOliveira Dec 14 '21 at 09:17

0 Answers0