2

I have kids' personal best times listed per week per stroke. For kid A, how can I calculate that kid's means for all previous weeks, excluding current week's results? For example, one kid's results might look like this:

df.trial <- data.frame(Week= c("w10", "w9", "w9", "w5", "w5", "w5", "w6", "w6", "w3"), Stroke= c("Fly","Free","Breast","Back","Free","Breast","Fly","Back","Free"),  Score = c(5.5,4.5,4.6,5.2, 4.3, 5.7, 4.7,5.5,4.8))

I'm trying to add a new column to calculate the means like this:

df.desired <- data.frame(Week= c("w10", "w9", "w9", "w5", "w5", "w5", "w6", "w6", "w3"), Stroke= c("Fly","Free","Breast","Back","Free","Breast","Fly","Back","Free"), Score = c(5.5, 4.5, 4.6, 5.2, 4.3, 5.7, 4.7, 5.5, 4.8), prev.mean = c(4.91, 5.03, 5.03, 4.80, 4.80, 4.80, 5.00, 5.00, NA))

The data I have has lots of kids so I will need to group by each student's name. Also, if there are no previous weeks then I'd like to add an NA.

I have tried a few different ways and have tried to write a function like the rollsum one described here: R previous week total

No luck so far. Any suggestions please?

Uwe Keim
  • 39,551
  • 56
  • 175
  • 291
Leila
  • 27
  • 7
  • It's helpful if you can include some of the code that hasn't worked, so we can help you improve upon it and see what direction you're going in. – camille Mar 04 '19 at 15:47

1 Answers1

0

EDIT: new answer

I misunderstood your question, I think this is what you are looking for. Note that I added a column kid_id in your data.frame to show you a solution which is applicable for multiple ids.

Also note that I have to convert Weeks into numeric so I can arrange for Weeks, otherwise the arrangement won't be correct.

library(tidyverse)

df.trial <- data.frame(Kid_id = 1, Week= c("w10", "w9", "w9", "w5", "w5", "w5", "w6", "w6", "w3"), Stroke= c("Fly","Free","Breast","Back","Free","Breast","Fly","Back","Free"),  Score = c(5.5,4.5,4.6,5.2, 4.3, 5.7, 4.7,5.5,4.8))

df.trial %>% as_tibble() %>% 
  mutate(Week = as.numeric(gsub("w", "", as.character(Week))), 
         kid_id = 1) %>% 
  group_by(kid_id, Week) %>% 
  arrange(kid_id, Week) %>% 
  summarise(sum_score = sum(Score),
            n_score = n()) %>% 
  mutate(prev.mean = map_dbl(seq_along(sum_score), ~sum(sum_score[1:.x - 1]) / sum(n_score[1:.x - 1])),
         Week = paste0("w", Week)) %>% 
  ungroup() %>% 
  select(kid_id, Week, prev.mean) %>% 
  left_join(df.trial, ., by = "Week")


  Week Stroke Score kid_id prev.mean
1  w10    Fly   5.5      1  4.912500
2   w9   Free   4.5      1  5.033333
3   w9 Breast   4.6      1  5.033333
4   w5   Back   5.2      1  4.800000
5   w5   Free   4.3      1  4.800000
6   w5 Breast   5.7      1  4.800000
7   w6    Fly   4.7      1  5.000000
8   w6   Back   5.5      1  5.000000
9   w3   Free   4.8      1       NaN

Here you have NaNinstead of NA but they can easily be replaced if desired.

Cettt
  • 11,460
  • 7
  • 35
  • 58
  • Thank you! That is great! I had tried dplyr but didn't come across the map_dbl. Your solution is spot on and works with multiple kids too when I change the kid_id = 1 to kid_id = kid_id. Thanks again for your help! – Leila Mar 05 '19 at 09:51