How to apply functions in columns for data frames with different sizes in nested list?

Question

In R, to apply some function to a column, you can do:

df$col <- someFunction(df$col)

Now my question is, how do you the similar task when you have data frames in a nested list? Say I have a following list like this, where I have data frames in the second level from the root.

                                           +------+------+
                                  type1    | id   | name |
                              +----------->|------|------|
                              |            |      |      |
                              |            |      |      |
                year1         |            +------+------+
           +------------------+
           |                  |
           |                  |            +------+------+-----+
           |                  |  type2     | meta1|meta2 | name|
           |                  +----------> |------|------|-----|
           |                               |      |      |     |
           +                               +------+------+-----+
           |                     type1    +------+------+
           |                  +---------> | id   |name  |
           |                  |           |------|------|
           |     year2        |           |      |      |
   list    +----------------->+           |      |      |
           +                  |           +------+------+
           |                  |  type2     +------+------+-----+
           |                  +--------->  | meta1|meta2 |name |
           |                               |------|------|-----|
           |                               |      |      |     |
           |                    type1      +------+------+-----+
           |                 +---------->  +------+------+
           |                 |             | id   |name  |
           |     year3       |             |------|------|
           +-----------------+             |      |      |
                             |             |      |      |
                             |  type2      +------+------+
                             +---------->  +------+------+-----+
                                           |meta1 | meta2|name |
                                           |------|------|-----|
                                           |      |      |     |
                                           +------+------+-----+

And I want to modify the "name" column in each of the data frame in the leaves with some functions and store the results there. How do you do that?

Here is the example data:

data<-list()

data$yr2001$type1 <- df_2001_1 <- data.frame(index=1:3,name=c("jack","king","larry"))
data$yr2001$type2 <- df_2001_2 <- data.frame(index=1:5,name=c("man","women","oliver","jack","jill"))
data$yr2002$type1 <- df_2002_1 <- data.frame(index=1:3,name=c("janet","king","larry"))
data$yr2002$type2 <- df_2002_2 <- data.frame(index=1:5,name=c("alboyr","king","larry","rachel","sam"))
data$yr2003$type1 <- df_2003_1 <- data.frame(index=1:3,name=c("dan","jay","zang"))
data$yr2003$type2 <- df_2003_2 <- data.frame(index=1:5,name=c("zang","king","larry","kim","fran"))

say I want to uppercase all of the names in in the name column in each data frame stored in the list

My general advice would be to not organize your data this way. It looks to me like what you have here are two (or possibly just one) un-melted data frames. — joran, Feb 24 '14 at 22:09

score 3 · Answer 1 · answered Feb 24 '14 at 22:20

I agree with @joran's comment above---this is begging to be consolidated by adding type as a column. But here is one way with rapply. This assumes that the name column is the only factor column in each nested data.frame. As in @josilber's answer, my function of choice is toupper.

rapply(data, function(x) toupper(as.character(x)), classes='factor', how='replace')

This will drop the data.frame class, but the essential structure is preserved. If your name columns are already character, then you would use.

rapply(data, toupper, classes='character', how='replace')

josliber · Answer 2 · 2014-02-24T23:08:32.723

You can nest the lapply function twice to get at the inner data frames. Here, I apply toupper to each name variable:

result <- lapply(data, function(x) {
  lapply(x, function(y) {
    y$name = toupper(y$name)
    return(y)
  })
})
result

# $yr2001
# $yr2001$type1
#   index  name
# 1     1  JACK
# 2     2  KING
# 3     3 LARRY
# 
# $yr2001$type2
#   index   name
# 1     1    MAN
# 2     2  WOMEN
# 3     3 OLIVER
# 4     4   JACK
# 5     5   JILL
# 
# 
# $yr2002
# $yr2002$type1
#   index  name
# 1     1 JANET
# 2     2  KING
# 3     3 LARRY
# 
# $yr2002$type2
#   index   name
# 1     1 ALBOYR
# 2     2   KING
# 3     3  LARRY
# 4     4 RACHEL
# 5     5    SAM
# 
# 
# $yr2003
# $yr2003$type1
#   index name
# 1     1  DAN
# 2     2  JAY
# 3     3 ZANG
# 
# $yr2003$type2
#   index  name
# 1     1  ZANG
# 2     2  KING
# 3     3 LARRY
# 4     4   KIM
# 5     5  FRAN

josilber, maybe I'm misinterpreting, but is this really recursive, or just two nested loops? Also, I hope you don't mind my borrowing `toupper` in my example. — BrodieG, Feb 24 '14 at 22:52
Thanks -- updated the wording. Of course the `toupper` borrowing is fine :) — josliber, Feb 24 '14 at 23:09

score 2 · Accepted Answer · answered Feb 24 '14 at 22:23

2

To illustrate (using your simplified example):

library(reshape2)
dat1 <- melt(data,id.vars = c("index","name"))
> dat1$NAME <- toupper(dat1$name)

answered Feb 24 '14 at 22:23

joran

169,992
32
429
468

1

I ended up using this solution, although it doesn't directly answer the question I raised(compared to other solutions using `lapply`, which I upvoted instead) . You were correct in that by organizing the data this way, it resolves all the downstream problems and made the analysis easy. It's as if I asked for a faster horse, and you gave me a car – Alby Feb 25 '14 at 21:41

BrodieG · Answer 4 · 2014-02-24T22:55:34.537

Here is a truly recursive version based on lapply (i.e. will work with deeper nesting) and doesn't make any other assumptions except that the only types of terminal leaves you have are data frames. Unfortunately rapply won't stop the recursion at data.frames so you have to use lapply if you want to operate on the data frames (otherwise Matthew's answer is perfect).

samp.recur <- function(x) 
  lapply(x, 
    function(y) 
      if(is.data.frame(y)) transform(y, name=toupper(name)) else samp.recur(y))

This produces:

samp.recur(data)
# $yr2001
# $yr2001$type1
#   index  name
# 1     1  JACK
# 2     2  KING
# 3     3 LARRY

# $yr2001$type2
#   index   name
# 1     1    MAN
# 2     2  WOMEN
# 3     3 OLIVER
# 4     4   JACK
# 5     5   JILL

# etc...

Though I do also agree with others you may want to consider re-structuring your data.

How to apply functions in columns for data frames with different sizes in nested list?

4 Answers4

Linked