how to use apply family to average multiple conditional arguments in R

Question

This is a very large dataset and I'm trying to get away from writing for loops in R. Looking for a way to attack what I would usually use a nested loop to do.

For each unique value in the confidence col., I need to extract the row indices for all other rows in the confidence col. that match that value. For example, the first occurrence, (50) would return 1,7,9. Then, using those indices, I want to average the values for the seqs column. Here, the first occurrence (50) would return 1980, 7357, and 3008 and then average these. The indented output would be a data frame with 2 columns: one with a list of unique values for confidence and one with a corresponding list of the average # seqs for each unique confidence value.

input

#seqs       confidence
1980        50
1088        52
1099        52
2000        42
7009        45
1092        48
7357        50
5909        42
3008        50

output

ave.#seqs     confidence
4115          50
1093.5        52 
3954.5        42...

Linked post is about `sum`, just change it to `mean`. – zx8754 Jun 15 '17 at 21:21 — zx8754, Jun 15 '17 at 21:21

score -1 · Answer 1 · answered Jun 15 '17 at 21:13

-1

Given that it's a "very large dataset", I suggest a data.table solution.

library(data.table)
> setDT(data)[, mean(seqs), by=confidence]
   confidence     V1
1:         50 4115.0
2:         52 1093.5
3:         42 3954.5
4:         45 7009.0
5:         48 1092.0

Solutions using dplyr functions or aggregate would also work, but they're less efficient.

answered Jun 15 '17 at 21:13

Yannis Vassiliadis

1,719
8
14

1

Is it traditional to downvote valid answers to downvoted questions? That doesn't make much sense to me. – svenhalvorson Jun 15 '17 at 22:22

how to use apply family to average multiple conditional arguments in R

input

output

1 Answers1