3

I have a data set for racing performance in horses over several years and I want to calculate the age at which the horses reach their peak performance. Here is a made up example of my data:

data <- data.frame(
Name=c(rep("Ari",3),rep("Aegir",3),rep("Lixhof",3)),
Competition.year = c("2015", "2013", "2012", "2008", "2009", "2010", "2015", "2016", "2017"), 
P2=c(7.97, 8.40, 8.51, 9.49, 8.70, 8.40, 8.82, 9.07, 8.59),
Competition.age=c(16,14,13,8,9,10,12,13,14))

Here, P2 is the variable for the time records. The smaller the value, the better performance (I'm looking for fastest times to calculate peak performance). Competition age shows what age (in years) each horse was for each year they competed.

My real data has around 2000 observations for 127 horses. What I want is to calculate the mean age for when they reach their peak performance (as in, at what age are horses, in general, fastest). I've seen some posts that use aggregate to calculate means by groups, but I don't think that's exactly what I need, since it has to first look at the times, then make a mean of the ages from the fastest one.

I'd appreciate any help with this! Thank you!

Karolis Koncevičius
  • 9,417
  • 9
  • 56
  • 89

3 Answers3

2

Given your example you can use something like this.

library(dplyr)

df_min <- df %>% 
  group_by(Name) %>% 
  filter(P2 == min(P2)) # filter records on fastest race time per horse

mean(df_min$Competition.age)
[1] 13.33333

As @MKR pointed out, you can also do it in one statement. It is slightly more typing and you do not have the intermediate result of df_min. It all depends on what else you want to do with the data you have.

df_min <- df %>% 
  group_by(Name) %>% 
  filter(P2 == min(P2)) %>% 
  ungroup() %>% 
  summarise(best_age = mean(Competition.age)) 
phiver
  • 23,048
  • 14
  • 44
  • 56
  • Thanks! It seems to be working fine! – Laura Bas Apr 29 '18 at 10:54
  • @phiver You could have done in single sentence as: df %>% group_by(Name) %>% filter(P2 == min(P2)) %>% ungroup() %>% summarise(best_age = mean(Competition.age)) – MKR Apr 29 '18 at 11:27
  • 1
    @MKR, I will add it to the answer, but it is more typing and you do not keep the intermediate result. – phiver Apr 29 '18 at 11:45
2

We can calculate the average using data.table by first filtering for horses's age with maximum performance (min(P2)) and then then taking mean of Competition.age as:

library(data.table)
setDT(data)

data[,.SD[P2 == min(P2)], by=.(Name)][,mean(Competition.age)]
#[1] 13.33333
MKR
  • 19,739
  • 4
  • 23
  • 33
0

For the sake of completeness here is the solution with aggregate:

best <- aggregate(P2 ~ Name, data = data, FUN = min)
best <- merge(data, best)
mean(best$Competition.age)
#> [1] 13.33333

Short explanation:

  1. Select minimum P2 values by name
  2. Merge reduced dataset with the original data
  3. Take the mean of age.
Karolis Koncevičius
  • 9,417
  • 9
  • 56
  • 89
  • Or using `ave`: `with(data, mean(Competition.age[as.logical(ave(P2, Name, FUN = function(x) x == min(x)))]))` – Henrik Apr 29 '18 at 12:33