0

Why doesn't this work for an example? There's same value in each row and warning as well

data <- data.frame(id = 1:10)
slowCall <- function(id) data.frame(b = rep(id, 3), c = runif(3))
data[,c("d", "e")] <- sapply(data$id, function(id) {
 tmp <- slowCall(id)
 list(sum(tmp$b), min(tmp$c))
})

Warning message:
In `[<-.data.frame`(`*tmp*`, , c("d", "e"), value = list(3L, 0.104784948984161,  :
 provided 20 variables to replace 2 variables
print(data)
   id d         e
1   1 3 0.1047849
2   2 3 0.1047849
3   3 3 0.1047849
4   4 3 0.1047849
5   5 3 0.1047849
6   6 3 0.1047849
7   7 3 0.1047849
8   8 3 0.1047849
9   9 3 0.1047849
10 10 3 0.1047849
Blue Magister
  • 13,044
  • 5
  • 38
  • 56
user2908955
  • 13
  • 1
  • 2
  • it does "work" just not the way you would like it to work. It is working exactly as it _should_ work ;) Have a search throgh stackoverflow for `[r] data.table` – Ricardo Saporta Oct 22 '13 at 21:40
  • I was looking for how to do it especially with data.frame instead of data.table. I'm also not saying that it should work as I expected but I would like to understand why I'm getting results looking like that. – user2908955 Oct 23 '13 at 07:24

3 Answers3

0

You could try something like this. First, vectorize the assign function (per @Joran's answer here), then modify your code slightly.

# vectorize
assignVec <- Vectorize("assign",c("x","value"))

library(plyr)
set.seed(1) # this is just here for reproducibility

data <- data.frame(id = 1:10)
slowCall <- function(id) data.frame(b = rep(id, 3), c = runif(3))

# I store this as `tmp` just to make the code below look cleaner
tmp <- mlply(sapply(data$id, function(id) {
    tmp <- slowCall(id)
    list(sum(tmp$b), min(tmp$c))
}), c)

# here's the key part:
data <- within(data, assignVec(c('d','e'), tmp, envir=environment()))

Output:

> data
   id          e  d
1   1 0.26550866  3
2   2 0.20168193  6
3   3 0.62911404  9
4   4 0.06178627 12
5   5 0.38410372 15
6   6 0.49769924 18
7   7 0.38003518 21
8   8 0.12555510 24
9   9 0.01339033 27
10 10 0.34034900 30

Note: I invoke plyr::mlply to get your sapply output into a list.

The simpler answer, though, is to change the righthand side of your operation into:

data[,c("d", "e")] <- as.data.frame(t(sapply(data$id, function(id) {
 tmp <- slowCall(id)
 list(sum(tmp$b), min(tmp$c))
})))

which would give you the same result.

Community
  • 1
  • 1
Thomas
  • 43,637
  • 12
  • 109
  • 140
  • Why transposing is needed in this? When assigning a single column for example it populates it without transposing just fine. I thought apply family of functions translate roughly to map function found in functional languages but returning list or vector seems to be some kind of special case here. – user2908955 Oct 23 '13 at 09:15
  • @user2908955 Take a look at the output of your `sapply` call when it contains only return value versus two. Your current output is a matrix, which has to be coerced to fit into your dataframe correctly, whereas with one return value, it is simply a vector that will be added as a column quite easily. – Thomas Oct 23 '13 at 09:19
0

The problem here is that the matrix returned by your sapply contains one-element lists instead of numeric values. Change your list to a c and transpose the output, then it will work.

data[, c("d", "e")] <- t(sapply(data$id, function(id) {
  tmp <- slowCall(id)
  c(sum(tmp$b), min(tmp$c))
}))
shadow
  • 21,823
  • 4
  • 63
  • 77
0

Here's a generic method to add two columns of different data types (e.g. character and numeric). It uses lists and transposes lists (via this answer).

Here, this answer would preserve the integer and numeric types of the two outputs.

rowwise <- lapply(data$id, function(id) {
  tmp <- slowCall(id)
  list(sum(tmp$b), min(tmp$c))
})
colwise <- lapply(seq_along(rowwise[[1]]), function(i) lapply(rowwise, "[[", i))

data[,c("d", "e")] <- colwise
Community
  • 1
  • 1
Blue Magister
  • 13,044
  • 5
  • 38
  • 56