5

I have data frames A, B, C, ... and want to modify each data frame in the same way, e.g. re-ordering factors levels of a factor which is present in all of the data frames:

A = data.frame( x=c('x','x','y','y','z','z') )
B = data.frame( x=c('x','y','z') )
C = data.frame( x=c('x','x','x','y','y','y','z','z','z') )

A$x = factor( A$x, levels=c('z','y','x') )
B$x = factor( B$x, levels=c('z','y','x') )
C$x = factor( C$x, levels=c('z','y','x') )

This gets laborious if there are lots of data frames and/or lots of modifications to be done. How can I do it concisely, using a loop or something better? A straightforward approach like

for ( D in list( A, B, C ) ) {
D$x = factor( D$x, levels=c('z','y','x') )
}

does not work, because it doesn't modify the original data frames.

EDIT: added definitions of A, B, and C to make it reproducible.

shadow
  • 21,823
  • 4
  • 63
  • 77
baixiwei
  • 1,009
  • 4
  • 20
  • 27
  • 1
    Could you provide [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example)? – zero323 Nov 02 '13 at 03:14
  • Definitions of A, B, and C have been added so that you can run the code. – baixiwei Nov 02 '13 at 03:26
  • 1
    Thanks. I know it can annoying especially when situation is obvious but it is a good practice and makes our lives easier :) – zero323 Nov 02 '13 at 06:31

2 Answers2

4

One thing to note about R is that, with respect to assignment, <- is transitive, whereas = is not. Thus, if your data frames are all the same in this respect, you should be able to do something like this:

A$x <- B$x <- C$x <- factor( C$x, levels=c('z','y','x') )
gung - Reinstate Monica
  • 11,583
  • 7
  • 60
  • 79
3

If you don't need explicit loop you can use lapply:

ll <- lapply(
    list(A, B, C),
    function(df) {
        df$x <- factor(df$x, levels=c('z', 'y', 'x'))
        return(df)
    }
)

Since data is only copied you'll have to use list returned from lapply.

Edit

dfs <- list('A', 'B', 'C')
levels <- c('z', 'y', 'x')

l <- lapply(
    dfs,
    function(df) {
        # Get data frame by name
        df <- get(df)
        df$x <- factor(df$x, levels=levels)
        return(df)
    }
)


for ( i in 1:length(dfs)) {
    assign(dfs[[i]], l[[i]])
}
zero323
  • 322,348
  • 103
  • 959
  • 935
  • If you don't put in `return(df)` you will not get back dataframe elements. – IRTFM Nov 02 '13 at 07:18
  • This is OK, but I would like a way to modify the original data frames, or more precisely, I want to continue referring to them by their original names. Is there an easy way to get that result using the output of this solution? – baixiwei Nov 02 '13 at 14:03
  • I've posted an edit with example solution but I cannot say I like it. – zero323 Nov 02 '13 at 14:56