-1

I have a dataframe similar to the following:

site | date | risk
A      12/31  4
B      12/31  3
C      12/31  2
A      1/1    3
B      1/1    4
C      1/1    8
A      1/2    4
B      1/2    5
C      1/2    6

I want to calculate the average risk for each site after 12/31. I would want my output table to look like the following

site | risk
A      3.5
B      4.5
C      7

I also have more columns in my original dataframe, but I do not need them for this metric. Any suggestions?

mumair
  • 51
  • 6
  • take a look at the dplyr package. it makes data manip/aggregation like this very easy. – Chris Jun 19 '18 at 22:13
  • Possible duplicate of: [how to calculate mean/median per group in a dataframe in r](https://stackoverflow.com/questions/25198442/how-to-calculate-mean-median-per-group-in-a-dataframe-in-r) – markus Jun 19 '18 at 22:18
  • this seems risky to ignore the year. Consider converting it to a date first, with some explicit code how you decided what year the month & day belonged to. – wibeasley Jun 19 '18 at 23:33

1 Answers1

0

Here is a tidyverse possibility to get you started

library(tidyverse)
df %>%
    filter(date != "12/31") %>%
    group_by(site) %>%
    summarise(risk.mean = mean(risk))
## A tibble: 3 x 2
#  site  risk.mean
#  <fct>     <dbl>
#1 A          3.50
#2 B          4.50
#3 C          7.00

Sample data

df <- read.table(text =
    "site  date  risk
A      12/31  4
B      12/31  3
C      12/31  2
A      1/1    3
B      1/1    4
C      1/1    8
A      1/2    4
B      1/2    5
C      1/2    6", header = T)
Maurits Evers
  • 49,617
  • 4
  • 47
  • 68