0

Sample Data

I have a list of data frames with two columns each: 1. variable names, 2. integers.

df3 <- df2 <- df1 <- data.frame(Variable = LETTERS[1:5], Value = sample(10:20, 5, replace = TRUE))
df.list <- list(df1 = df1, df2 = df2, df3 = df3)
df.list

# $df1
#     Variable Value
# 1        A    17
# 2        B    16
# 3        C    16
# 4        D    18
# 5        E    10
#
# $df2
#     Variable Value
# 1        A    17
# ...

What I Want to Do

Each data frame in the list is named. I want to extract the name of the data frame and use it to rename the second column in that data frame:

# $df1
#     Variable df1
# 1        A    17
# 2        B    16
# 3        C    16
# 4        D    18
# 5        E    10
#
# $df2
#     Variable df2
# 1        A    17
# ... 

What I've Tried

I've written a function to do this using deparse(substitute() and regular expression pattern matching via sub():

mod.name <- function(x) {
    nx <- deparse(substitute(x))
    ny <- sub(".*\\$", "", nx)
    names(x)[2] <- ny
    x
}

When tested on a single data frame in the list, it works:

mod.name(df.list$df3)
df.list$df3

# Variable df3
# 1  A     17
# 2  B     16
# 3  C     16
# 4  D     18
# 5  E     10

However, when using lapply to do it on all the data frames in the list, it does not:

lapply(df.list, mod.name)
df.list

# $df1
#    Variable  X[[i]]
# 1        A     17
# 2        B     16
# 3        C     16
# 4        D     18
# 5        E     10
#
# $df2
#    Variable  X[[i]]
# 1        A    17
# ...

Of course, the issue of using deparse(substitute() with lapply() has been discussed before on StackOverflow, but I could not get any of the solutions here, here, or here to work for me.

coip
  • 1,312
  • 16
  • 30
  • 1
    `Map(function(x,y){ names(y)[[2]] <- x; y }, names(df.list), df.list)` but this is a bad idea. – Frank Sep 27 '17 at 21:24
  • @Frank That works--thanks! But can you expound on why it's a bad idea? – coip Sep 27 '17 at 21:54
  • 1
    Sure. Embedding a variable (the 1,2,3 in df1, df2, df3) as part of a column name means you'll have to parse that column name again later. Also, retaining a list of data.frames with related contents instead of using `rbind` to make a single one will continue to cause messiness (requiring more `lapply` or `mapply` instead of normal table operations). Gregor discusses tradeoffs involved in a list vs a single table here https://stackoverflow.com/a/24376207/ and Hadley discusses putting vars in col names here: https://www.jstatsoft.org/article/view/v059i10 – Frank Sep 27 '17 at 22:04
  • 1
    So in this case, I'd do `data.table::rbindlist(df.list, id="src")` (mentioned in Gregor's post). – Frank Sep 27 '17 at 22:05
  • 1
    @Frank Thanks for the explanation. That's good to know. In this specific case, my goal was to extract the newly named second columns and `cbind` them into a new data frame to more easily compare how the values change for the variables listed in the first column, but perhaps it would be better to `rbind` it first and then reshape from long two wide. – coip Sep 27 '17 at 22:24

2 Answers2

2

You don't need deparse(substitute(.)).

nms <- setNames(, names(df.list))
df.list2 <- lapply(nms, function(x){
        names(df.list[[x]])[2] <- x
        df.list[[x]]
    })
df.list2

EDIT
Following a suggestion of Frank in the comment below, I have changed nms <- names(df.list) to nms <- setNames(, names(df.list)), and the name of the output list to df.list2.

Rui Barradas
  • 70,273
  • 8
  • 34
  • 66
2

Technically, this works for the stated goal:

Map(function(x,y){ names(y)[[2]] <- x; y }, names(df.list), df.list)

The OP said their eventual goal was to cbind the columns together (presumably since the Variable column is identical in all the tables), so here are some other ideas.

rbind and reshape. To get there, if all the variables are of the same type (e.g., integer or float), I would just store the data in long form:

library(data.table)
DT = rbindlist(df.list, id = "src")

    src Variable Value
 1: df1        A    17
 2: df1        B    11
 3: df1        C    20
 4: df1        D    10
 5: df1        E    19
 6: df2        A    17
 7: df2        B    11
 8: df2        C    20
 9: df2        D    10
10: df2        E    19
11: df3        A    17
12: df3        B    11
13: df3        C    20
14: df3        D    10
15: df3        E    19

From there, you can go back to wide format with

dcast(DT, Variable ~ src)

   Variable df1 df2 df3
1:        A  17  17  17
2:        B  11  11  11
3:        C  20  20  20
4:        D  10  10  10
5:        E  19  19  19

cbind. If the columns have different types, they could be brought together with

as.data.table(c(df.list[[1]][1], lapply(df.list, `[`, -1)))

   Variable df1.Value df2.Value df3.Value
1:        A        17        17        17
2:        B        11        11        11
3:        C        20        20        20
4:        D        10        10        10
5:        E        19        19        19

If you're using data.table like this, setnames can be used to change the column names afterwards.


The OP mentioned both ideas in comments. I'm just illustrating the code details.

Either way, I think sticking to one table will be easier for later analysis.

Frank
  • 66,179
  • 8
  • 96
  • 180