Find mean within data.fame

Question

I have this table:

3702    GO:0009611  0.682
3711    GO:0009611  35.418
4081    GO:0009611  18.072
3702    GO:0033554  0.400
3702    GO:0006812  0.378
3702    GO:0006412  0.373
3702    GO:0009058  0.346
3702    GO:0051641  0.312
29760   GO:0009611  28.697

I don't care about first column. Column 2 has some values repeated. What I'd like to get is a data.frame where the first column is a value of the column 2 of my initial table, and the second column of my output would be the corresponding mean of the column 3 of my initial table.

Something like:

GO:0051179  1.7398
GO:0016311  2.1595
GO:0010467  1.45633
GO:0044093  15.483
GO:0006811  2.4175
GO:0044238  0.927667
GO:0006812  3.0138
GO:0006807  1.048

In fact, I've got this output using awk:

awk '{print $2"\t"$3}' BP.txt | awk '{hash1[$1]+=$2} ; {hash2[$1]+=1} END {for (x in hash1) {print x"\t"hash1[x]/hash2[x]}}'

but no clue about doing this in R.

score 3 · Accepted Answer · answered Jul 03 '14 at 10:39

3

Just use tapply. So if you had a data frame dd, with three columns V1, V2 and V3, then

tapply(dd$V3, dd$V2, mean)

would give you what you want.

answered Jul 03 '14 at 10:39

csgillespie

59,189
14
150
185

Manoj G · Answer 2 · 2014-07-03T17:18:29.993

3

you could use data.table. If df is your data.frame, then do as following

library(data.table) ## 1.9.2+
dt <- as.data.table(df)
dt <- dt[, list(col = mean(col3)), by = col2]

edited Jul 03 '14 at 17:18

answered Jul 03 '14 at 10:43

Manoj G

1,776
2
24
29

`Manoj`. list(col=mean(col3))- closing brace – akrun Jul 03 '14 at 11:16

score 2 · Answer 3 · answered Jul 03 '14 at 10:41

2

An alternative for the tapply from @csgillespie is the by function:

by(dd$V3, dd$V2, mean)

answered Jul 03 '14 at 10:41

Jaap

81,064
34
182
193

score 1 · Answer 4 · answered Jul 03 '14 at 10:50

1

or Just the good old aggregate (assuming temp is your data set)

aggregate(V3 ~ V2, temp, mean)

answered Jul 03 '14 at 10:50

David Arenburg

91,361
17
137
196

Find mean within data.fame

4 Answers4