calculate the mean for each column of a matrix in R

Question

I am working on R in R studio. I need to calculate the mean for each column of a data frame.

 cluster1  // 5 by 4 data frame
 mean(cluster1) //

I got :

  Warning message:
  In mean.default(cluster1) :
  argument is not numeric or logical: returning NA

But I can use

  mean(cluster1[[1]])

to get the mean of the first column.

How to get means for all columns ?

Any help would be appreciated.

Useful self-help tools include the built-in [`apropos`](http://stat.ethz.ch/R-manual/R-patched/library/utils/html/apropos.html) (e.g. `apropos('mean')`), and [`findFn`](http://www.inside-r.org/packages/cran/sos/docs/findFn) in the `sos` package. — jbaums, Feb 16 '14 at 07:54
Also [this great reference card](http://cran.r-project.org/doc/contrib/Baggott-refcard-v2.pdf). — jbaums, Feb 16 '14 at 08:05

score 89 · Answer 1 · answered Feb 16 '14 at 06:15

89

You can use colMeans:

### Sample data
set.seed(1)
m <- data.frame(matrix(sample(100, 20, replace = TRUE), ncol = 4))

### Your error
mean(m)
# [1] NA
# Warning message:
# In mean.default(m) : argument is not numeric or logical: returning NA

### The result using `colMeans`
colMeans(m)
#   X1   X2   X3   X4 
# 47.0 64.4 44.8 67.8

answered Feb 16 '14 at 06:15

A5C1D2H2I1M1N2O1R2T1

190,393
28
405
485

1

What if we want to calculate the `median`, or `min`, `max`? Do we have something like `colMedians`? – Triet Doan Nov 19 '16 at 16:28
@AnhTriet, maybe consider the ["matrixStats" package](https://cran.r-project.org/web/packages/matrixStats/index.html)? – A5C1D2H2I1M1N2O1R2T1 Nov 20 '16 at 12:54
7

@TrietDoan If you have a dataframe and want to calculate medians, standard deviations, etc. use apply: `apply(df, 2, median)`. The '2' here means by column. See here: https://stackoverflow.com/a/18047916/5824031 – haff Aug 18 '18 at 18:29

score 30 · Answer 2 · answered Feb 16 '14 at 07:30

30

You can use 'apply' to run a function or the rows or columns of a matrix or numerical data frame:

cluster1 <- data.frame(a=1:5, b=11:15, c=21:25, d=31:35)

apply(cluster1,2,mean)  # applies function 'mean' to 2nd dimension (columns)

apply(cluster1,1,mean)  # applies function to 1st dimension (rows)

sapply(cluster1, mean)  # also takes mean of columns, treating data frame like list of vectors

answered Feb 16 '14 at 07:30

bob

301
2
2

Better if you use `colMeans(m)` and `rowMeans(m)` instead. It is optimized and faster than `apply(cluster1,1,mean)` – Rentrop Feb 16 '14 at 08:53

score 12 · Answer 3 · edited Jan 23 '17 at 15:34

12

In case you have NA's:

sapply(data, mean, na.rm = T)      # Returns a vector (with names)   
lapply(data, mean, na.rm = T)      # Returns a list

Remember that "mean" needs numeric data. If you have mixed class data, then use:

numdata<-data[sapply(data, is.numeric)]  
sapply(numdata, mean, na.rm = T)  # Returns a vector
lapply(numdata, mean, na.rm = T)  # Returns a list

edited Jan 23 '17 at 15:34

micstr

5,080
8
48
76

answered Dec 28 '16 at 11:52

Gonzalo user7334982

181
2
7

score 2 · Answer 4 · answered Jul 17 '16 at 11:54

Another way is to use purrr package

# example data like what is said above

@A Handcart And Mohair

set.seed(1)
m <- data.frame(matrix(sample(100, 20, replace = TRUE), ncol = 4))


library(purrr)
means <- map_dbl(m, mean)

> means
#  X1   X2   X3   X4 
#47.0 64.4 44.8 67.8

score 2 · Answer 5 · edited Jun 11 '18 at 13:40

2

You can try this:

mean(as.matrix(cluster1))

edited Jun 11 '18 at 13:40

Patrick

1,717
7
21
28

answered Jun 11 '18 at 13:18

weijia

21
1

I Ju Cheng · Answer 6 · 2020-01-15T09:13:12.833

2

try it ! also can calculate NA's data!

df <- data.frame(a1=1:10, a2=11:20)

df %>% summarise_each(funs( mean( .,na.rm = TRUE)))


# a1   a2
# 5.5 15.5

edited Jan 15 '20 at 09:13

answered Jan 15 '20 at 09:04

I Ju Cheng

23
3

score 2 · Answer 7 · answered Jan 26 '21 at 09:54

class(mtcars)
my.mean <- unlist(lapply(mtcars, mean)); my.mean



   mpg        cyl       disp         hp       drat         wt       qsec         vs 
 20.090625   6.187500 230.721875 146.687500   3.596563   3.217250  17.848750   0.437500 
        am       gear       carb 
  0.406250   3.687500   2.812500

score 1 · Answer 8 · answered Aug 27 '21 at 12:07

colMeans(A, na.rm = FALSE, dims = 1)

https://stat.ethz.ch/R-manual/R-devel/library/base/html/colSums.html

This is in the base class, so no library is required.

The first answer looks like it is using colMeans from the analytics library which is not available in the R version 4.0.2.

score 1 · Answer 9 · answered Aug 26 '22 at 12:40

Another option using the function fmean from the collapse package. Here is a reproducible example:

set.seed(1)
m <- data.frame(matrix(sample(100, 20, replace = TRUE), ncol = 4))
library(collapse)
fmean(m)

Output:

  X1   X2   X3   X4 
47.0 64.4 44.8 67.8

score 0 · Answer 10 · answered Mar 04 '18 at 15:42

For diversity: Another way is to converts a vector function to one that works with data frames by using plyr::colwise()

set.seed(1)
m <- data.frame(matrix(sample(100, 20, replace = TRUE), ncol = 4))

plyr::colwise(mean)(m)


#   X1   X2   X3   X4
# 1 47 64.4 44.8 67.8

calculate the mean for each column of a matrix in R

10 Answers10

Linked

Related