Use object names within a list in lapply/ldply

Question

In attempting to answer a question earlier, I ran into a problem that seemed like it should be simple, but I couldn't figure out.

If I have a list of dataframes:

df1 <- data.frame(a=1:3, x=rnorm(3))
df2 <- data.frame(a=1:3, x=rnorm(3))
df3 <- data.frame(a=1:3, x=rnorm(3))

df.list <- list(df1, df2, df3)

That I want to rbind together, I can do the following:

df.all <- ldply(df.list, rbind)

However, I want another column that identifies which data.frame each row came from. I expected to be able to use the deparse(substitute(x)) method (here and elsewhere) to get the name of the relevant data.frame and add a column. This is how I approached it:

fun <- function(x) {
  name <- deparse(substitute(x))
  x$id <- name
  return(x)
}
df.all <- ldply(df.list, fun)

Which returns

  a          x      id
1 1  1.1138062 X[[1L]]
2 2 -0.5742069 X[[1L]]
3 3  0.7546323 X[[1L]]
4 1  1.8358605 X[[2L]]
5 2  0.9107199 X[[2L]]
6 3  0.8313439 X[[2L]]
7 1  0.5827148 X[[3L]]
8 2 -0.9896495 X[[3L]]
9 3 -0.9451503 X[[3L]]

So obviously each element of the list does not contain the name I think it does. Can anyone suggest a way to get what I expected (shown below)?

  a          x  id
1 1  1.1138062 df1
2 2 -0.5742069 df1
3 3  0.7546323 df1
4 1  1.8358605 df2
5 2  0.9107199 df2
6 3  0.8313439 df2
7 1  0.5827148 df3
8 2 -0.9896495 df3
9 3 -0.9451503 df3

Not exactly an answer, but you might be interested in the various methods used [here](http://stackoverflow.com/q/15162197/324364). — joran, Mar 05 '13 at 02:04

iTech · Accepted Answer · 2013-03-05T02:25:07.433

10

Define your list with names and it should give you an .id column with the data.frame name

df.list <- list(df1=df1, df2=df2, df3=df3)
df.all <- ldply(df.list, rbind)

Output:

  .id a           x
1 df1 1  1.84658809
2 df1 2 -0.01177462
3 df1 3  0.58579469
4 df2 1 -0.64748756
5 df2 2  0.24384614
6 df2 3  0.59012676
7 df3 1 -0.63037679
8 df3 2 -1.17416295
9 df3 3  1.09349618

Then you can know the data.frame name from the column df.all$.id

Edit: As per @Gary Weissman's comment if you want to generate the names automatically you can do

names(df.list) <- paste0('df',seq_along(df.list)

edited Mar 05 '13 at 02:25

answered Mar 05 '13 at 02:12

iTech

18,192
4
57
80

I was writing up the same thing except with `do.call(rbind, df.list)` instead of `ldply` which also gives the original row. – N8TRO Mar 05 '13 at 02:20
2

The only disadvantage with this approach is that you actually have to type out the names of all your dataframes by hand. Could you do something like `names(df.list) <- paste0('df',seq_along(df.list))` – Gary Weissman Mar 05 '13 at 02:21
+1 This is really helpful - in essence the answer to my question is "Make it a named list" - but I appreciate that you went further! – alexwhan Mar 05 '13 at 02:24
@Gary Weissman - that is good in that it gives a unique identifier, but doesn't link to the specific name. Of course, once it's an unnamed list, it's not really possible to make that link. – alexwhan Mar 05 '13 at 02:26
Glad it worked. Yes the key is "*named list*" and you have the option to provide the name manually or automatically as per @Gary's comment – iTech Mar 05 '13 at 02:27

Gary Weissman · Answer 2 · 2013-03-05T02:17:21.487

3

Using base only, one could try something like:

dd <- lapply(seq_along(df.list), function(x) cbind(df_name = paste0('df',x),df.list[[x]]))

do.call(rbind,dd)

edited Mar 05 '13 at 02:17

answered Mar 05 '13 at 02:04

Gary Weissman

3,557
1
18
23

Heh. I _did_ try it, but that was before you actually finished editing it. For a while, you had only the `lapply` line up there (and a different version of it too boot) that definitely did not work. – joran Mar 05 '13 at 02:12
ah sorry, my browser was freezing and my edits got mangled ;-( – Gary Weissman Mar 05 '13 at 02:13
+1 At least a unique name is given for each element, but the issue is that each is named in order (ie not to do with the dataframe name). – alexwhan Mar 05 '13 at 02:20
@alexwhan an alternative to grab the name could be to replace with `df_name = names(df.list)[x]` – Gary Weissman Mar 05 '13 at 02:25

score 2 · Answer 3 · answered Mar 05 '13 at 02:06

In your definition, df.list does not have names, however, even then the deparse substitute idiom does not appear to work easilty (as lapply calls .Internal(lapply(X, FUN)) -- you would have to look at the source to see if the object name was available and how to get it

Something like

names(df.list) <- paste('df', 1:3, sep = '')

foo <- function(n, .list){
         .list[[n]]$id <- n
         .list[[n]]
       } 

     a          x id
1 1  0.8204213  a
2 2 -0.8881671  a
3 3  1.2880816  a
4 1 -2.2766111  b
5 2  0.3912521  b
6 3 -1.3963381  b
7 1 -1.8057246  c
8 2  0.5862760  c
9 3  0.5605867  c

score 2 · Answer 4 · answered Mar 05 '13 at 02:38

2

if you want to use your function, instead of deparse(substitute(x)) use match.call(), and you want the second argument, making sure to convert it to character

 name <- as.character(match.call()[[2]])

answered Mar 05 '13 at 02:38

Ricardo Saporta

54,400
17
144
178

Use object names within a list in lapply/ldply

4 Answers4

Linked