1

In R, to apply some function to a column, you can do:

df$col <- someFunction(df$col)

Now my question is, how do you the similar task when you have data frames in a nested list? Say I have a following list like this, where I have data frames in the second level from the root.

                                           +------+------+
                                  type1    | id   | name |
                              +----------->|------|------|
                              |            |      |      |
                              |            |      |      |
                year1         |            +------+------+
           +------------------+
           |                  |
           |                  |            +------+------+-----+
           |                  |  type2     | meta1|meta2 | name|
           |                  +----------> |------|------|-----|
           |                               |      |      |     |
           +                               +------+------+-----+
           |                     type1    +------+------+
           |                  +---------> | id   |name  |
           |                  |           |------|------|
           |     year2        |           |      |      |
   list    +----------------->+           |      |      |
           +                  |           +------+------+
           |                  |  type2     +------+------+-----+
           |                  +--------->  | meta1|meta2 |name |
           |                               |------|------|-----|
           |                               |      |      |     |
           |                    type1      +------+------+-----+
           |                 +---------->  +------+------+
           |                 |             | id   |name  |
           |     year3       |             |------|------|
           +-----------------+             |      |      |
                             |             |      |      |
                             |  type2      +------+------+
                             +---------->  +------+------+-----+
                                           |meta1 | meta2|name |
                                           |------|------|-----|
                                           |      |      |     |
                                           +------+------+-----+

And I want to modify the "name" column in each of the data frame in the leaves with some functions and store the results there. How do you do that?

Here is the example data:

data<-list()

data$yr2001$type1 <- df_2001_1 <- data.frame(index=1:3,name=c("jack","king","larry"))
data$yr2001$type2 <- df_2001_2 <- data.frame(index=1:5,name=c("man","women","oliver","jack","jill"))
data$yr2002$type1 <- df_2002_1 <- data.frame(index=1:3,name=c("janet","king","larry"))
data$yr2002$type2 <- df_2002_2 <- data.frame(index=1:5,name=c("alboyr","king","larry","rachel","sam"))
data$yr2003$type1 <- df_2003_1 <- data.frame(index=1:3,name=c("dan","jay","zang"))
data$yr2003$type2 <- df_2003_2 <- data.frame(index=1:5,name=c("zang","king","larry","kim","fran"))

say I want to uppercase all of the names in in the name column in each data frame stored in the list

Alby
  • 5,522
  • 7
  • 41
  • 51

4 Answers4

3

I agree with @joran's comment above---this is begging to be consolidated by adding type as a column. But here is one way with rapply. This assumes that the name column is the only factor column in each nested data.frame. As in @josilber's answer, my function of choice is toupper.

rapply(data, function(x) toupper(as.character(x)), classes='factor', how='replace')

This will drop the data.frame class, but the essential structure is preserved. If your name columns are already character, then you would use.

rapply(data, toupper, classes='character', how='replace')
Matthew Plourde
  • 43,932
  • 7
  • 96
  • 113
2

You can nest the lapply function twice to get at the inner data frames. Here, I apply toupper to each name variable:

result <- lapply(data, function(x) {
  lapply(x, function(y) {
    y$name = toupper(y$name)
    return(y)
  })
})
result

# $yr2001
# $yr2001$type1
#   index  name
# 1     1  JACK
# 2     2  KING
# 3     3 LARRY
# 
# $yr2001$type2
#   index   name
# 1     1    MAN
# 2     2  WOMEN
# 3     3 OLIVER
# 4     4   JACK
# 5     5   JILL
# 
# 
# $yr2002
# $yr2002$type1
#   index  name
# 1     1 JANET
# 2     2  KING
# 3     3 LARRY
# 
# $yr2002$type2
#   index   name
# 1     1 ALBOYR
# 2     2   KING
# 3     3  LARRY
# 4     4 RACHEL
# 5     5    SAM
# 
# 
# $yr2003
# $yr2003$type1
#   index name
# 1     1  DAN
# 2     2  JAY
# 3     3 ZANG
# 
# $yr2003$type2
#   index  name
# 1     1  ZANG
# 2     2  KING
# 3     3 LARRY
# 4     4   KIM
# 5     5  FRAN
josliber
  • 43,891
  • 12
  • 98
  • 133
  • josilber, maybe I'm misinterpreting, but is this really recursive, or just two nested loops? Also, I hope you don't mind my borrowing `toupper` in my example. – BrodieG Feb 24 '14 at 22:52
  • Thanks -- updated the wording. Of course the `toupper` borrowing is fine :) – josliber Feb 24 '14 at 23:09
2

To illustrate (using your simplified example):

library(reshape2)
dat1 <- melt(data,id.vars = c("index","name"))
> dat1$NAME <- toupper(dat1$name)
joran
  • 169,992
  • 32
  • 429
  • 468
  • 1
    I ended up using this solution, although it doesn't directly answer the question I raised(compared to other solutions using `lapply`, which I upvoted instead) . You were correct in that by organizing the data this way, it resolves all the downstream problems and made the analysis easy. It's as if I asked for a faster horse, and you gave me a car – Alby Feb 25 '14 at 21:41
1

Here is a truly recursive version based on lapply (i.e. will work with deeper nesting) and doesn't make any other assumptions except that the only types of terminal leaves you have are data frames. Unfortunately rapply won't stop the recursion at data.frames so you have to use lapply if you want to operate on the data frames (otherwise Matthew's answer is perfect).

samp.recur <- function(x) 
  lapply(x, 
    function(y) 
      if(is.data.frame(y)) transform(y, name=toupper(name)) else samp.recur(y))

This produces:

samp.recur(data)
# $yr2001
# $yr2001$type1
#   index  name
# 1     1  JACK
# 2     2  KING
# 3     3 LARRY

# $yr2001$type2
#   index   name
# 1     1    MAN
# 2     2  WOMEN
# 3     3 OLIVER
# 4     4   JACK
# 5     5   JILL

# etc...

Though I do also agree with others you may want to consider re-structuring your data.

BrodieG
  • 51,669
  • 9
  • 93
  • 146