0

I have a table like this :

enter image description here

and i would like to calculate the average of mean values for each column in table one for the same start and ends in order to have a table like this :

enter image description here

Can you tell me how to do this table in R ?

user3683485
  • 115
  • 1
  • 8
  • 2
    Try `aggregate(. ~ chr + i.start + i.end + coverage_con, df, mean)` – Darren Tsai Sep 08 '20 at 13:48
  • a good practice when you share your data is to copy and paste a **print** of the dataframes, instead of images. it's too hard to reproduce it since we cannot copy and paste it.. – rodolfoksveiga Sep 08 '20 at 13:52
  • @DarrenTsai the call to `aggregate` is much mor concise then rodolfoksveiga s long pipe plus it does not need extra packages. Could you please make that an answer so I can upvote that? – Bernhard Sep 08 '20 at 13:54
  • thanks @DarrenTsai, i already gave you a like.. because both your solutions are smart. and `summarize_all()` is the way! – rodolfoksveiga Sep 08 '20 at 14:01
  • @Bernhard, `agreegate()` is more concise, but not as much efficient as `summarize()`, from **dplyr**.. – rodolfoksveiga Sep 08 '20 at 14:02
  • @rodolfoksveiga Yes, there are certainly many aspects that go into that choice and if the data are so big, that time taken for aggregation is of concern, dplyr is among the better options in R. – Bernhard Sep 08 '20 at 14:20
  • 1
    @rodolfoksveiga you are right! `summarize()` is more efficient in large data than `aggregate()`. – Darren Tsai Sep 08 '20 at 14:26

2 Answers2

3

A base solution with aggregate():

aggregate(. ~ chr + i.start + i.end + coverage_con, df, mean)

The dplyr version:

library(dplyr)

df %>%
  group_by(chr, i.start, i.end, coverage_con) %>%
  summarise(across(.fns = mean, .names = "average_{col}"))

summarise(across(.fns = mean)) is equivalent to summarise_all(mean), but the former can adjust column names by the glue specification.


If the data include other non-numeric columns except those grouped ones, you can calculate means only on those numeric columns by where(), i.e.

... %>%
  summarise(across(where(is.numeric), mean, .names = "average_{col}"))

which is equivalent to summarise_if(is.numeric, mean).

Darren Tsai
  • 32,117
  • 5
  • 21
  • 51
1

Considering that your dataframe was called df, you can do:

library(dplyr)
df %>%
  group_by(chr, i.start, i.end, coverage_con) %>%
  summarize_all(mean)
rodolfoksveiga
  • 1,181
  • 4
  • 17