1

I have multiple data frames, for each of them I want to get the mean of every row and create a new variable. I want to calculate the mean using specific columns, because all data frames have the same variables, and the variables I want to calculate the means are not in sequence in the data frame, I am using the columns index number.

I am just creating some data frame here for an example, the ones I am actually using are larger than these ones.

mt1 <- matrix(c(1:50), nrow=5)
mt2 <- matrix(c(51:100), nrow=5)
mt3 <- matrix(c(101:150), nrow=5)
df1 <- data.frame(mt1)
df2 <- data.frame(mt2)
df3 <- data.frame(mt3)

dfs<-list(df1,df2,df3)

I tried to use a similar code from Same function over multiple data frames in R but it did not work. Here is what I tried to do and the errors I got:

Code 1)

lapply(dfs, function(x) { x$mean <- rowMeans(x[1,2,3,8,9,10]); x})

Error in `[.data.frame`(x, 1, 2, 3, 7, 8, 9, 10) : 
unused arguments (7, 8, 9, 10)

Code 2)

lapply(dfs, function(x) { x$mean <- rowMeans(select(1:3,8:10)); x })

Error in UseMethod("select") : 
no applicable method for 'select' applied to an object of class "c('integer', 'numeric')"

I am not sure what it is the issue. So any help or suggestion how I can do that is very much appreciated! Thank you!

1 Answers1

1

You need to concatenate.

lapply(dfs, function(x) {x$mean <- rowMeans(x[c(1, 2, 3, 8, 9, 10)]); x})

Maybe it's a little nicer using transform.

lapply(dfs, \(x) transform(x, mean=rowMeans(x[c(1, 2, 3, 8, 9, 10)])))
jay.sf
  • 60,139
  • 8
  • 53
  • 110