How to calculate mean value of all columns of datarame

Question

I have a data frame and I want to calculate the mean of all columns and save it into a new dataframe. I found this solution calculate the mean for each column of a matrix in R however, this is only for matrix and not dataframe

structure(list(TotFlArea = c(1232, 596, 708, 1052, 716), logg_weighted_assess = c(13.7765298160156, 
13.1822275291412, 13.328376420438, 13.3076293132057, 13.5164823091252
), TypeDwel1.2.Duplex = c(0, 0, 0, 0, 0), TypeDwelApartment.Condo = c(0, 
1, 1, 1, 1), TypeDwelTownhouse = c(1, 0, 0, 0, 0), Age_new.70 = c(0, 
0, 0, 0, 0), Age_new0.1 = c(0, 0, 0, 0, 0), Age_new16.40 = c(1, 
1, 0, 1, 0), Age_new2.5 = c(0, 0, 0, 0, 0), Age_new41.70 = c(0, 
0, 0, 0, 0), Age_new6.15 = c(0, 0, 1, 0, 1), LandFreehold = c(1, 
1, 1, 0, 1), LandLeasehold.prepaid = c(0, 0, 0, 1, 0), LandOthers = c(0, 
0, 0, 0, 0), cluster_K_mean.1 = c(0, 0, 0, 0, 0)), row.names = c("1", 
"2", "3", "4", "5"), class = "data.frame")

Can you please advise how I can do this?

Note: my data frame can have NA values which should be excluded from mean calculation

You can use `colMeans` which work for data.frame/matrix i.e. `colMeans(yourdata, na.rm = TRUE)` (assuming all columns are numeric) — akrun, Feb 20 '22 at 19:41

Sweepy Dodo · Answer 1 · 2022-02-20T20:22:39.410

1

As @akrun pointed out. Also another alternative

apply(df, 2, mean)

where 2 means by column and 1 is by row.

However, besides its flexibility (e.g. changing from mean to mode or applying to selected columns only apply(df[,c('a', 'b')], 2, mean)) below shows the disadvantage to using apply (in terms of speed)

library(data.table)
library(microbenchmark)

# dummy data
x <- 1e7
df <- data.table(a = 1:x )
y <- letters[2:10]
df[, (y) := lapply(2:10, \(i) a+i)]

# benchmark
z <- 
microbenchmark(colMeans = {colMeans(df)}
               , apply = {apply(df, 2, mean)}
               , times = 30
               )

plot(z)

edited Feb 20 '22 at 20:22

answered Feb 20 '22 at 19:43

Sweepy Dodo

1,761
9
15

If you're going to provide an alternative, you should add some commentary about when to prefer one over the other. – Gregor Thomas Feb 20 '22 at 19:44
Thanks, @GregorThomas. 1 pro of using `apply` would be the flexibility in case of the change in function needed. On the other hand, I do acknowledge its lower speed vs `colMeans` – Sweepy Dodo Feb 20 '22 at 20:06
1

Great - much better answer with a little commentary. – Gregor Thomas Feb 20 '22 at 20:13

How to calculate mean value of all columns of datarame

1 Answers1