0

I'm new to R and need some help. I have a huge data frame with different samples of patients. Each patient has 24 'chrom's. Each 'chrom' has 3 segments. Below is an example of patient 'A2461'. Below is an example of some of the data I have:

     ID chrom loc.start   loc.end num.mark seg.mean seg.sd seg.median seg.mad
1 A2461     1     61735  23342732    13103   0.0314 0.4757     0.0221  0.4811
2 A2461     1  23345569  54962669    17435  -0.0103 0.4807    -0.0292  0.4821
3 A2461     1  54963958  55075062       57   0.4841 0.4070     0.5201  0.3519
1 A2461     2     12784  17248573    13037  -0.0037 0.4643    -0.0053  0.4583
2 A2461     2  17248890  85480817    45819  -0.0331 0.4667    -0.0352  0.4635
3 A2461     2  85481399  89121495     1626   0.0153 0.4727     0.0000  0.4617

I currently have the total mean by using the following code:

seg_mean <- df$seg.mean
mean(seg_mean)

However, I would like to calculate the mean of 'seg.mean' for each chromosome with an output clarifying the patient ID and chrom. So perhaps something like...

ID    chrom    seg.mean
A2461     1     0.1684
A2461     2    -0.0072

Any help would be much appreciated! Thanks for reading.

M--
  • 25,431
  • 8
  • 61
  • 93
  • [This answer](https://stackoverflow.com/questions/21982987/mean-per-group-in-a-data-frame) might be helpful. [Or this one](https://stackoverflow.com/questions/9723208/aggregate-summarize-multiple-variables-per-group-i-e-sum-mean-etc). – Nick Criswell Jun 09 '17 at 18:54
  • 1
    `aggregate(.~ID, data=df, mean)` – M-- Jun 09 '17 at 18:54

3 Answers3

5

You can use base-R function:

aggregate(.~ ID + chrom, data=df, mean)

This will give you:

#      ID chrom loc.start  loc.end num.mark     seg.mean    seg.sd seg.median   seg.mad 
# 1 A2461     1  26123754 44460154 10198.33  0.168400000 0.4544667     0.1710 0.4383667 
# 2 A2461     2  34247691 63950295 20160.67 -0.007166667 0.4679000    -0.0135 0.4611667

or you can choose to get only average of seg.mean:

aggregate(.~ ID + chrom, data=df, mean)[,c("ID", "chrom","seg.mean")]

#      ID chrom     seg.mean 
# 1 A2461     1  0.168400000 
# 2 A2461     2 -0.007166667 

Data

df <- structure(list(ID = c("A2461", "A2461", "A2461", "A2461", "A2461", 
    "A2461"), chrom = c(1L, 1L, 1L, 2L, 2L, 2L), loc.start = c(61735L, 
    23345569L, 54963958L, 12784L, 17248890L, 85481399L), loc.end = c(23342732L, 
    54962669L, 55075062L, 17248573L, 85480817L, 89121495L), num.mark = c(13103L, 
    17435L, 57L, 13037L, 45819L, 1626L), seg.mean = c(0.0314, -0.0103, 
    0.4841, -0.0037, -0.0331, 0.0153), seg.sd = c(0.4757, 0.4807, 
    0.407, 0.4643, 0.4667, 0.4727), seg.median = c(0.0221, -0.0292, 
    0.5201, -0.0053, -0.0352, 0), seg.mad = c(0.4811, 0.4821, 0.3519, 
    0.4583, 0.4635, 0.4617)), .Names = c("ID", "chrom", "loc.start", 
    "loc.end", "num.mark", "seg.mean", "seg.sd", "seg.median", "seg.mad"
    ), row.names = c(NA, -6L), class = "data.frame")
M--
  • 25,431
  • 8
  • 61
  • 93
4
require(dplyr)

seg_mean <- df %>% group_by(ID, chrom) %>% summarise(seg.mean = mean(seg.mean))
M--
  • 25,431
  • 8
  • 61
  • 93
ssp3nc3r
  • 3,662
  • 2
  • 13
  • 23
  • Do you have any suggestions on how to write this as a function? I would like to implement this on different patients and maybe create a loop as I have a big dataset with over 100 patients. – Young Autobot Jun 10 '17 at 13:48
  • You can wrap anything in a function `myfunc <- function(x) {}` but I thought the above was for different patients: `ID`. So I'm not sure exactly what you're looking for. – ssp3nc3r Jun 10 '17 at 15:21
1

just a little modification of Masoud's solution.

aggregate(seg.mean~ID+chrom , df , mean)
M--
  • 25,431
  • 8
  • 61
  • 93
Cron
  • 61
  • 3