Error looping through list: "Error in `[<-.data.frame`(`tmp`, , i, value = c(7L, 1L, 4L, 7L, 7L, : new columns would leave holes... "

Question

I'm trying to write a function that loops through a list in order to run kmeans clustering on only specific columns of a dataset. I want the output to be a matrix/dataframe of the cluster membership of each observation when kmeans is run on each set of columns.

Here's a mock dataset and the function I came up with (I'm new to R--sorry if it's shaky)

set.seed(123)
mydata <- data.frame(a = rnorm(100,0,1), b = rnorm(100,0,1), c = 
rnorm(100,0,1), d = rnorm(100,0,1), e = rnorm(100,0,1)) 

set.seed(123)
my.kmeans <- function(data,k,...) {
    clusters <- data.frame(matrix(nrow = nrow(data), ncol = 
    length(list(...)))) # set up dataframe for clusters
    for(i in list(...)) {
        kmeans <- kmeans(data[,i],centers = k)
        clusters[,i] <- kmeans$cluster
    }
    colnames(clusters) <- list(...)
    clusters
}

My question is: this seems to work when I only ask it to use consecutive columns, but not when I ask it to skip around some. For instance, the first of the following works, but the second does not. Any idea how I can fix this?

# works how I want 
head(my.kmeans(data = mydata, k = 8, c(1,2), c(2,3), c(1,2,3)))

# doesn't work 
head(my.kmeans(data = mydata, k = 8, c(1,2), c(2,3), c(1,2,5)))

Also, I know people recommend using apply functions and staying away from for loops, but I don't know how to do this with an apply function. Any advice on that would be much appreciated as well.

Thanks so much!

Danny

the problem is in this part of the code `clusters[,i] <- kmeans$cluster` because `i` resolves to 5 in your second case — SatZ, Jul 10 '18 at 07:14
Thanks so much @SatZ! Could you explain why i resolves to 5? And how I might get around this? Sorry--I'm pretty new to R. Thanks a lot! — Danny, Jul 10 '18 at 19:04
For anyone who's following (though this is pretty specific so I doubt it), I think I figured it out: you have to change "for(i in list(....))" to "for(i in 1:length(list(...)))"; that way, when you subset with i later, it fills in correctly. Thanks @SatZ — Danny, Jul 11 '18 at 19:23

score 1 · Answer 1 · answered Jul 11 '18 at 19:31

1

Building on @SatZ's comments,

set.seed(123)
mydata <- data.frame(a = rnorm(100,0,1), b = rnorm(100,0,1), c = 
                   rnorm(100,0,1), d = rnorm(100,0,1), e = 
                   rnorm(100,0,1)) 
mylist <- list(c(1,2), c(2,3), c(1,2,5))

set.seed(123)
my.kmeans <- function(data,k,list) {
  clusters <- data.frame(matrix(nrow = nrow(data), ncol = 
                              length(list))) # set up dataframe for 
                              clusters
  for(i in 1:length(list)) {
      kmeans <- kmeans(data[,list[[i]]],centers = k)
      clusters[,i] <- kmeans$cluster
  }
  colnames(clusters) <- list
  clusters
}

head(my.kmeans(data = mydata, k = 8, list = mylist))

answered Jul 11 '18 at 19:31

Danny

383
2
3
16

you could look at this for more details on how to use ellipsis (...) https://stackoverflow.com/questions/5890576/usage-of-three-dots-or-dot-dot-dot-in-functions https://stackoverflow.com/questions/13353847/how-to-expand-an-ellipsis-argument-without-evaluating-it-in-r – SatZ Jul 12 '18 at 04:19
definitely. just thought it would be easier to follow length(list) than length(list(...)). Thanks so much for your help! – Danny Jul 12 '18 at 14:46

Error looping through list: "Error in `[<-.data.frame`(`*tmp*`, , i, value = c(7L, 1L, 4L, 7L, 7L, : new columns would leave holes... "

1 Answers1

Error looping through list: "Error in `[<-.data.frame`(`tmp`, , i, value = c(7L, 1L, 4L, 7L, 7L, : new columns would leave holes... "