The place to start is by factoring your code into chunks, where each new function works on each level of the data. Then you can call each piece from the other and collect the results in a more idiomatic way.
Here I made functions for 1) the core code for each train/test pair, 2) repeating that for each desired K, and 3) repeating that across the possible pairs.
I agree with @Deja that restructuring your data to a more "tidyverse" style method could result in even more intuitive code, but if you're not used to thinking in that way, this is probably clearer.
## run core code for a particular train/test pair
run1 <- function(train, test, K, seeds) {
set.seed(seeds)
train <- as.data.frame(train)
test <- as.data.frame(test)
kpreds <- class::knn(train[, -ncol(train)],test[,-ncol(test)], train[,ncol(train)],k=K)
Ktable <- table(kpreds ,test[, ncol(test)])
(Ktable[1, 2] + Ktable[2, 1]) / sum(Ktable)
}
## run a particular train/test pair at several values of K
runK <- function(train, test, Ks, seeds) {
errors <- sapply(Ks, function(K) run1(train, test, K, seeds))
data.frame(K=Ks, error=errors)
}
## test several train/test pairs, at several values of K
findKNN <- function(df, Ks=1:5, seeds){
stopifnot(length(df$train)==length(df$test))
out <- lapply(seq_along(df$train), function(i) {
cbind(i=i, runK(df$train[[i]], df$test[[i]], Ks, seeds))
})
out <- do.call(rbind, out)
aggregate(error ~ K, out, mean)
}
## loop over several sets of data
output <- lapply(flist, seeds=12345, findKNN)
To put the data in a more "tidy" format, you'd have one row per test/train pair with additional columns for which data set and which rep it is. A little awkward to get there from what you started with, but here's what it would look like.
n <- sapply(lapply(flist, `[[`, "train"), length)
ftrain <- do.call(c, lapply(flist, `[[`, "train"))
ftest <- do.call(c, lapply(flist, `[[`, "test"))
nn <- rep(names(n), n)
ii <- unlist(lapply(n, function(i) seq_len(i)))
library(tidyverse)
alld <- tibble(data=nn, i=ii, train=ftrain, test=ftest)
alld
## # A tibble: 6 x 4
## data i train test
## <chr> <int> <list> <list>
## 1 iris 1 <data.frame [150 x 5]> <data.frame [6 x 5]>
## 2 iris 2 <data.frame [150 x 5]> <data.frame [6 x 5]>
## 3 iris 3 <data.frame [150 x 5]> <data.frame [6 x 5]>
## 4 iris2 1 <data.frame [150 x 5]> <data.frame [6 x 5]>
## 5 iris2 2 <data.frame [150 x 5]> <data.frame [6 x 5]>
## 6 iris2 3 <data.frame [150 x 5]> <data.frame [6 x 5]>
You'd then loop through each row. (Note to make this work I had to make the result of runK be a data.frame.)
out <- alld %>% mutate(error=map2(train, test, runK, Ks=1:5, seeds=12345))
out
## # A tibble: 6 x 5
## data i train test error
## <chr> <int> <list> <list> <list>
## 1 iris 1 <data.frame [150 x 5]> <data.frame [6 x 5]> <data.frame [5 x 2]>
## 2 iris 2 <data.frame [150 x 5]> <data.frame [6 x 5]> <data.frame [5 x 2]>
## 3 iris 3 <data.frame [150 x 5]> <data.frame [6 x 5]> <data.frame [5 x 2]>
## 4 iris2 1 <data.frame [150 x 5]> <data.frame [6 x 5]> <data.frame [5 x 2]>
## 5 iris2 2 <data.frame [150 x 5]> <data.frame [6 x 5]> <data.frame [5 x 2]>
## 6 iris2 3 <data.frame [150 x 5]> <data.frame [6 x 5]> <data.frame [5 x 2]>
You then take out the original data, "unnest" the error data.frame, and summarize over data set and K.
out %>% select(-train, -test) %>% unnest() %>%
group_by(data, K) %>% summarize(error=mean(error))
## # A tibble: 10 x 3
## # Groups: data [?]
## data K error
## <chr> <int> <dbl>
## 1 iris 1 0
## 2 iris 2 0
## 3 iris 3 0
## 4 iris 4 0
## 5 iris 5 0
## 6 iris2 1 0
## 7 iris2 2 0
## 8 iris2 3 0
## 9 iris2 4 0
## 10 iris2 5 0