2

I want to rewrite some of the first lines from this question, and I can't figure out why my sapply line isn't working.

I want to turn these lines:

cols <- sample(c(1:5), 1)
label <- rep(paste0("label ", seq(from=1, to=10)))
mydata <- data.frame(label)
for (i in 1:cols) {mydata[,i+1] <- sample(c(1:10), 10)}

into:

cols <- sample(c(1:5), 1) 
mydata <- data.frame(rep(paste0("label ", seq(1,10))))
sapply(1:cols, function(x) { mydata[,(x+1)] <- sample(c(1:10), 10) } )

but for some reason that sapply line gives me a new columns would leave holes after existing columns error, and I don't know why.

I've also tried

sapply(1:cols, function(x) { mydata[,(x+1)] <- sample(c(1:10), 10); mydata } )
Map(function(x, mydata1) {mydata1[,(x+1)] <- sample(c(1:10), 10)}, x = 1:cols, mydata1 = mydata)
wtrs
  • 319
  • 3
  • 12

2 Answers2

1

EDIT:

When you assign new column in the mydata dataframe, it does it locally to the function. Any changes to the mydata dataframe does not apply in the parent environment of this function.

To see this effect, use a print statement inside the function.

mydata <- data.frame( label = rep(paste0("label ", seq(1,10))))
sapply( 1:cols, function(x) { 
  mydata[[(x+1)]] <- sample(c(1:10), 10)
  print(mydata)
  } )
mydata

To prevent this scoping issue you can use <<- instead of <-.

sapply(1:cols, function(x) { mydata[,(x+1)] <<- sample(c(1:10), 10) } )

Note: using <<- approach is strongly discouraged due to confusions created later by it, when your code base grows and your computations involve multiple packages.

Possible Solution:

You have take the output of the sapply command and column bind with the mydata.

Try this:

set.seed(1L)
cols <- sample(c(1:5), 1) 
print(cols) # [1] 2
mydata <- data.frame( label = rep(paste0("label ", seq(1,10))))
do.call("cbind",
        list( mydata,
              sapply( seq_len(cols), function(x) sample(c(1:10), 10) )
        ))

Output:

#     label    1  2
# 1   label 1  4  2
# 2   label 2  6  7
# 3   label 3  8  4
# 4   label 4  2  6
# 5   label 5  9  3
# 6   label 6  5  8
# 7   label 7  3  5
# 8   label 8  7 10
# 9   label 9  1  9
# 10 label 10 10  1
Sathish
  • 12,453
  • 3
  • 41
  • 59
1

I was not able to determine why your code wasn't working, but it has something to do with the columns not being defined before you run sapply. So if you define your data.frame beforehand, it works

cols <- sample(c(1:5), 1) 
mydata <- data.frame(matrix(rep(0, 10*(cols+1)), ncol = cols+1))
mydata[, 1] <- rep(paste0("label ", seq(1,10)))
sapply(1:cols, function(x) {
  mydata[, x+1] <- sample(c(1:10), 10) } )

EDIT:

You can use the following code instead

cols <- sample(c(1:5), 1) 
mydata <- data.frame(rep(paste0("label ", seq(1,10))),
                     sapply(1:cols, function(x) {sample(c(1:10), 10) } ))
smanski
  • 541
  • 2
  • 7
  • Thanks, this is interesting. It definitely has something to do how a dataframe is technically a list. The output of that sapply function is a matrix, though, and doesn't have any of the labels. – wtrs Mar 23 '18 at 21:04
  • 1
    This works, though: `cols <- sample(c(1:5), 1); mydata <- matrix(rep(0, 10*(cols)), ncol = cols); mydata <- sapply(1:cols, function(x) mydata[, x] <- sample(c(1:10), 10)); mydata <- data.frame(rep(paste0("label ", seq(1,10))), mydata)` – wtrs Mar 23 '18 at 21:04
  • @smanski provided me with a very elegant solution in two lines that works fine for me. Your answer was helpful too, which is why I upvoted it, but your solution is much less concise. – wtrs Mar 23 '18 at 21:54
  • Yes, I'm talking specifically about the portion of @smanski's answer that he added in an edit. – wtrs Mar 23 '18 at 22:26