Calculate mean of each numeric column and add as result as row

Question

So, I would like to calculate the mean of each numeric column and put the results in the row below the column. Let's start with a data:

> head(tbl_mut)

     timetE4_1  timetE1_2  timetE2_2  timetE3_2  timetE4_2   eve_mean   mor_mean  tot_mean
    1   4048.605   59094.48   27675.59   26374.06   43310.01   7774.442   39113.53  23443.99
    2  45729.986  139889.21  111309.64  129781.17   96924.62  43374.117  119476.16  81425.14
    3 639686.154 1764684.16 1117027.29 1147967.45 1156442.48 585562.724 1296530.34 941046.53
    4   4466.153   26250.32   20320.08   18413.54   29061.25   3866.547   23511.30  13688.92

And that's what I want to achieve:

timetE4_1  timetE1_2  timetE2_2  timetE3_2  timetE4_2   eve_mean   mor_mean  tot_mean
    1   4048.605   59094.48   27675.59   26374.06   43310.01   7774.442   39113.53  23443.99
    2  45729.986  139889.21  111309.64  129781.17   96924.62  43374.117  119476.16  81425.14
    3 639686.154 1764684.16 1117027.29 1147967.45 1156442.48 585562.724 1296530.34 941046.53
    4   4466.153   26250.32   20320.08   18413.54   29061.25   3866.547   23511.30  13688.92
    .....
    445    X          X          X          X          X         X           X          X

X - the mean of the values in the column.

Note that the data may contain other, non-numeric columns.

score 6 · Accepted Answer · edited Aug 16 '20 at 10:34

Use rbind and colMeans as in:

> rbind(tbl_mut, colMeans = colMeans(tbl_mut))
          timetE4_1  timetE1_2  timetE2_2  timetE3_2  timetE4_2   eve_mean   mor_mean  tot_mean
1          4048.605   59094.48   27675.59   26374.06   43310.01   7774.442   39113.53  23443.99
2         45729.986  139889.21  111309.64  129781.17   96924.62  43374.117  119476.16  81425.14
3        639686.154 1764684.16 1117027.29 1147967.45 1156442.48 585562.724 1296530.34 941046.53
4          4466.153   26250.32   20320.08   18413.54   29061.25   3866.547   23511.30  13688.92
colMeans 173482.724  497479.54  319083.15  330634.05  331434.59 160144.458  369657.83 264901.15

EDIT

Suppose your data frame contains both numeric and non-numeric columns (like the 'Description' column):

> df
  Description  timetE4_1  timetE1_2  timetE2_2  timetE3_2  timetE4_2   eve_mean   mor_mean  tot_mean
1           A   4048.605   59094.48   27675.59   26374.06   43310.01   7774.442   39113.53  23443.99
2           B  45729.986  139889.21  111309.64  129781.17   96924.62  43374.117  119476.16  81425.14
3           C 639686.154 1764684.16 1117027.29 1147967.45 1156442.48 585562.724 1296530.34 941046.53
4           D   4466.153   26250.32   20320.08   18413.54   29061.25   3866.547   23511.30  13688.92

...then you can use sapply(df, is.numeric) to obtain the numeric columns, on which you then calculate colmeans.

> suppressWarnings(rbind(df, colMeans = colMeans(df[, sapply(df, is.numeric)])))
         Description  timetE4_1  timetE1_2  timetE2_2  timetE3_2  timetE4_2   eve_mean   mor_mean  tot_mean
1                  A   4048.605   59094.48   27675.59   26374.06   43310.01   7774.442   39113.53  23443.99
2                  B  45729.986  139889.21  111309.64  129781.17   96924.62  43374.117  119476.16  81425.14
3                  C 639686.154 1764684.16 1117027.29 1147967.45 1156442.48 585562.724 1296530.34 941046.53
4                  D   4466.153   26250.32   20320.08   18413.54   29061.25   3866.547   23511.30  13688.92
colMeans        <NA> 497479.542  319083.15  330634.05  331434.59  160144.46 369657.833  264901.15 173482.72

Or if you know the index of the non-numeric variable, e.g. the first column, you can de-select that column with df[, -1]:

suppressWarnings(rbind(df, colMeans = colMeans(df[, -1])))

I forgot that there is one column with non-numeric values. Names of rows. The column name is "Description". How can I ignore it ? Thx. — Rechlay, Nov 05 '13 at 11:36

score 5 · Answer 2 · edited May 23 '17 at 12:24

R does have a function addmargins that lets you do something like this, but it expects a table or matrix as the input.

addmargins(as.matrix(mydf), 1, FUN = mean)
#       timetE4_1  timetE1_2  timetE2_2  timetE3_2  timetE4_2   eve_mean   mor_mean  tot_mean
# 1      4048.605   59094.48   27675.59   26374.06   43310.01   7774.442   39113.53  23443.99
# 2     45729.986  139889.21  111309.64  129781.17   96924.62  43374.117  119476.16  81425.14
# 3    639686.154 1764684.16 1117027.29 1147967.45 1156442.48 585562.724 1296530.34 941046.53
# 4      4466.153   26250.32   20320.08   18413.54   29061.25   3866.547   23511.30  13688.92
# mean 173482.724  497479.54  319083.15  330634.05  331434.59 160144.458  369657.83 264901.15

Update

There is an almost identical (conceptually) question here, and I thought I would share my answer from there here too.

Assume we're starting with:

mydf <- structure(list(Description = c("A", "B", "C", "D"), 
    timetE4_1 = c(4048.605, 45729.986, 639686.154, 4466.153), 
    Boo = structure(1:4, .Label = c("a", "b", "c", "d"), 
    class = "factor"), timetE1_2 = c(59094.48, 139889.21, 
    1764684.16, 26250.32), timetE2_2 = c(27675.59, 111309.64, 
    1117027.29, 20320.08), Baa = c(FALSE, FALSE, TRUE, NA)), 
    .Names = c("Description", "timetE4_1", "Boo", "timetE1_2", 
    "timetE2_2", "Baa"), row.names = c("1", "2", "3", "4"), 
    class = "data.frame")

mydf
#   Description  timetE4_1 Boo  timetE1_2  timetE2_2   Baa
# 1           A   4048.605   a   59094.48   27675.59 FALSE
# 2           B  45729.986   b  139889.21  111309.64 FALSE
# 3           C 639686.154   c 1764684.16 1117027.29  TRUE
# 4           D   4466.153   d   26250.32   20320.08    NA

@Jilber's solution won't work in that case and will lead to lots of misplaced columns. Instead, use rbind.fill from "plyr". I've used sapply to specify my function in this example to show that it is easy to use whatever function you want, not just the col* functions.

library(plyr)
useme <- sapply(mydf, is.numeric)
rbind.fill(mydf, data.frame(t(sapply(mydf[useme], sum))))
#   Description  timetE4_1  Boo  timetE1_2  timetE2_2   Baa
# 1           A   4048.605    a   59094.48   27675.59 FALSE
# 2           B  45729.986    b  139889.21  111309.64 FALSE
# 3           C 639686.154    c 1764684.16 1117027.29  TRUE
# 4           D   4466.153    d   26250.32   20320.08    NA
# 5        <NA> 693930.898 <NA> 1989918.17 1276332.60    NA

Error in array(values, dim = newdim, dimnames = newdimnames) : length of 'dimnames' [1] not equal to array extent — Rechlay, Nov 05 '13 at 11:53
@Rechlay, wow. You can post an error message! But seriously, have you played with the function at all to try different settings and see how it works or understand why you are getting an error message? Perhaps if you posted a reproducible example, you'll be able to get more meaningful help. See http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example for details. — A5C1D2H2I1M1N2O1R2T1, Nov 05 '13 at 12:16

Calculate mean of each numeric column and add as result as row

2 Answers2

Update

Linked

Related