0

I would like to combine multiple dataframes but before that I'd like to add the name of the dataframe as character string in each entry of a new column. I'm almost there but don't see the problem. Code:

df1 <- data.frame("X1"=c(1,1),"X2"=c(1,1))
df2 <- data.frame("X1"=c(2,2),"X2"=c(2,2))
df3 <- data.frame("X1"=c(3,3),"X2"=c(3,3))

addCol <- function(df){df$newCol <- deparse(substitute(df)); df} 
# Extracts name of dataframe and writes it into entries of newCol

alldfsList <- lapply(list(df1,df2,df3), function(df) x <- addCol(df)) 
# Should apply addCol function to all dataframes, generates a list of lists

alldfs <- do.call(rbind, alldfsList) # Converts list of lists into dataframe

The problem is that the second command doesn't write the name of the dataframe into the column entries, but the placeholder, "df". But when I apply the addCol function manually to a single dataframe, it works. Can you help? Thanks!

Output:

> alldfs

  X1 X2 newCol
1  1  1     df
2  1  1     df
3  2  2     df
4  2  2     df
5  3  3     df
6  3  3     df
> 

Function applied to a single df works:

> addCol(df1)

  X1 X2 newCol
1  1  1    df1
2  1  1    df1
> 
  • Try `lapply(list(df1,df2,df3), function(df) addCol(df))` – Sotos Jul 13 '18 at 14:54
  • @Sotos Doesn't do the trick :| Problem persists. –  Jul 13 '18 at 14:55
  • 1
    Then maybe you should share a reproducible example of your data frames – Sotos Jul 13 '18 at 14:57
  • @Sotos - updated –  Jul 13 '18 at 15:07
  • 1
    [This question](https://stackoverflow.com/questions/16951080/can-lists-be-created-that-name-themselves-based-on-input-object-names) looks related, and the chosen answer might give some clues. I'd probably do this by making the list via `tibble::lst` as shown [in another answer](https://stackoverflow.com/a/51276081/2461552) (since `lst()` adds names) and then use `dplyr::bind_rows()` with the `.id` argument to add the row of id variables when row binding. – aosmith Jul 13 '18 at 15:17

2 Answers2

0

The easiest would be to use dplyr::bind_rows

library(dplyr)
bind_rows(lst(df1,df2,df3),.id="newCol")
#   newCol X1 X2
# 1    df1  1  1
# 2    df1  1  1
# 3    df2  2  2
# 4    df2  2  2
# 5    df3  3  3
# 6    df3  3  3
moodymudskipper
  • 46,417
  • 11
  • 121
  • 167
  • 2
    Maybe use `tibble::lst()` [as shown here](https://stackoverflow.com/a/51276081/2461552) prior to `bind_rows()` to put the actual names in instead of an index? – aosmith Jul 13 '18 at 15:19
  • Yes! Works! Thanks to both of you! :) –  Jul 13 '18 at 15:22
0

Moody_Mudskipper answer is a better solution, this is just so you understand what's happening with your code.

From the substitute help page:

substitute returns the parse tree for the (unevaluated) expression expr, substituting any variables bound in env

When you run addCol inside a function in lapply, substitute gets the name from that environment. Look what happens when you change the syntax in lapply:

> lapply(list(df1,df2,df3), function(x) x <- addCol(x)) 
[[1]]
  X1 X2 newCol
1  1  1      x
2  1  1      x

[[2]]
  X1 X2 newCol
1  2  2      x
2  2  2      x

[[3]]
  X1 X2 newCol
1  3  3      x
2  3  3      x

What you need is to use a different method to get the object name. Or change the code so the function have the name as input. Here's an example:

addCol <- function(df.name) {
  dataf <- get(df.name)
  dataf$newCol <- df.name
  return(dataf)
}

> do.call(rbind, lapply(ls(pattern='df'), addCol))
  X1 X2 newCol
1  1  1    df1
2  1  1    df1
3  2  2    df2
4  2  2    df2
5  3  3    df3
6  3  3    df3