1

Right now I have a training loop to test different model parameters for the kknn package, and it looks like this:

# generate validation results
kernel  <- c('gaussian', 'optimal', 'rectangular', 'biweight', 'cos', 'inv', 'triangular', 'epanechnikov')
# empty array to hold the results
results <- array(dim = c(length(kernel)*50, 4), dimnames = list(NULL, c('K', 'MSE', 'MAE', 'KERNEL')))
start = 1
stop  = 50

# run the loop
for (i in kernel) {
    model <- train.kknn(R1~., data, kmax = 50, kernel = i)
    results[start:stop, 1] = 1:50
    results[start:stop, 2] = model$MEAN.SQU
    results[start:stop, 3] = model$MEAN.ABS
    results[start:stop, 4] = i
    start = start + 50
    stop  = stop + 50
}

This works fine enough. However, I want to eventually use the summarize function in dplyr to look at my model results, but the main problem I'm running into is that the values in results seem to all be strings.

If I call typeof on each column in results it returns character, but I would assume it should return double instead.

If I run results %>% group_by(K) %>% summarize(mean_val = mean(MSE)) then I get the error message

Error in UseMethod("group_by_"): no applicable method for 'group_by_' applied to an object of class "c('matrix', 'character')"

which I assume means that you can't groupby on something without numeric values.

Any tips on what I'm doing incorrectly would be much appreciated. Thank you!

EDIT

It was noted in the comments that dplyr commands only work with a data.frame and a tibble. However, converting the results array into either of these does not work either.

If I run the line:

results = data.frame(results)

Running str(results) returns the following picture:

[![enter image description here][1]][1]

I get something similar for using as_tibble in place of data.frame.

Running the dplyr commands gives the following error message:

"argument is not numeric or logical: returning NA"Warning message in mean.default(MSE):

So I think I'm still about where I started.

Thank you. [1]: https://i.stack.imgur.com/QsO7o.png

Jonathan Bechtel
  • 3,497
  • 4
  • 43
  • 73
  • 3
    Right now your `results` variable is an array. The dplyr functions only work with data.frames/tibbles. You can use the `as_tibble` function to convert your array to a tibble. – MrFlick Sep 07 '21 at 17:48
  • @MrFlick this was not mentioned in the original question but passing the array into a data.frame also resulted in all of the columns being cast as factors, rather than numbers. – Jonathan Bechtel Sep 07 '21 at 23:14
  • 1
    Arrays can only hold on data type. You can't have both strings and numbers in an array like you can a data.frame. Once you store one string in there, it all becomes strings. You really need to avoid the array in the first place. Since we don't have the data, we can't run the code to test it. Its easier to help you when you provide a minimal [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) – MrFlick Sep 07 '21 at 23:47

0 Answers0