0

I have a data frame and I want to calculate the mean of all columns and save it into a new dataframe. I found this solution calculate the mean for each column of a matrix in R however, this is only for matrix and not dataframe

structure(list(TotFlArea = c(1232, 596, 708, 1052, 716), logg_weighted_assess = c(13.7765298160156, 
13.1822275291412, 13.328376420438, 13.3076293132057, 13.5164823091252
), TypeDwel1.2.Duplex = c(0, 0, 0, 0, 0), TypeDwelApartment.Condo = c(0, 
1, 1, 1, 1), TypeDwelTownhouse = c(1, 0, 0, 0, 0), Age_new.70 = c(0, 
0, 0, 0, 0), Age_new0.1 = c(0, 0, 0, 0, 0), Age_new16.40 = c(1, 
1, 0, 1, 0), Age_new2.5 = c(0, 0, 0, 0, 0), Age_new41.70 = c(0, 
0, 0, 0, 0), Age_new6.15 = c(0, 0, 1, 0, 1), LandFreehold = c(1, 
1, 1, 0, 1), LandLeasehold.prepaid = c(0, 0, 0, 1, 0), LandOthers = c(0, 
0, 0, 0, 0), cluster_K_mean.1 = c(0, 0, 0, 0, 0)), row.names = c("1", 
"2", "3", "4", "5"), class = "data.frame")

Can you please advise how I can do this?

Note: my data frame can have NA values which should be excluded from mean calculation

Ross_you
  • 881
  • 5
  • 22

1 Answers1

1

As @akrun pointed out. Also another alternative

apply(df, 2, mean)

where 2 means by column and 1 is by row.

However, besides its flexibility (e.g. changing from mean to mode or applying to selected columns only apply(df[,c('a', 'b')], 2, mean)) below shows the disadvantage to using apply (in terms of speed)

library(data.table)
library(microbenchmark)

# dummy data
x <- 1e7
df <- data.table(a = 1:x )
y <- letters[2:10]
df[, (y) := lapply(2:10, \(i) a+i)]

# benchmark
z <- 
microbenchmark(colMeans = {colMeans(df)}
               , apply = {apply(df, 2, mean)}
               , times = 30
               )

plot(z)

benchmark

Sweepy Dodo
  • 1,761
  • 9
  • 15