0

I am attempting to use the knn() function in the class package to solve a problem. I have split the iris dataset into 50% training data and 50% test data. I am attempting to predict the variety variable using sepal width and petal width. My knn() call is as follows:

> predictions <- knn(iris.train[, c(1:2)], iris.test[, c(1:2)], iris.train[, 3], k = 10)

In this instance, columns 1 and 2 of iris.train and iris.test are sepal width and petal width. Column 3 of both datasets is the variety variable as a factor. I continuously get the error that 'train' and 'class' have different lengths. When checking dimensions of what I pass into the function, this is what I get:

> dim(iris.train[, c(1:2)])
[1] 75  2

> dim(iris.test[, c(1:2)])
[1] 75  2

> dim(iris.train[, 3])
[1] 75  1

So I would assume that I'm missing something. How can I resolve the issue of 'train' and 'class' being different lengths? Thank you to anyone who can help!

Phil
  • 7,287
  • 3
  • 36
  • 66
  • 1
    Can you provide your full code? Are you using the `class` package? I'm not getting any errors. – Phil Feb 18 '22 at 17:24

1 Answers1

0

The cl argument should be a factor/vector of length equal to the number of rows in train. If you check length(iris.train[,3]), you'll see that it is equal to 1 (i.e. it is a one-column frame), which is not the same as the number of rows in train.

Try this:

predictions <- knn(iris.train[, c(1:2)], iris.test[, c(1:2)], iris.train[[3]], k = 10)

langtang
  • 22,248
  • 1
  • 12
  • 27
  • Thank you for your help! That worked for me. Just out of curiosity, what does the double bracket mean in R? When I say "iris.train[[3]]", what does R interpret that to mean? – carson.schroer Feb 18 '22 at 17:58
  • note that `iris.train` is a list. When we use `[`, we get a list of the elements selected. For example `iris.train[c(1,3,5)]` will return a list of the selected elements from `iris.train`. Likewise, `iris.train[3]` will also return a list of the selected elements (in this case, as single column). However, `iris.train[[3]]` will return the third element, which in this case will be a vector of values.. My explanation here is insufficent - best to check out this post: https://stackoverflow.com/questions/1169456/the-difference-between-bracket-and-double-bracket-for-accessing-the-el – langtang Feb 18 '22 at 18:19