Apologies if the question isn't formulated correctly in the title, I am fairly new to this and still not used to the exact terminology.
I need to add a column to a data frame that contains the result of operations (e.g., mean, sum, etc.) grouped by values in other columns. At the same time, I would not like to lose any of the data in the data frame. However, I haven't been able to find a function that does that, or a solution of any kind.
Let me illustrate with an example:
df
Name Round Points
1 John 1 6
2 John 2 4
3 John 3 6
4 John 4 8
5 Annie 1 7
6 Annie 2 4
7 Annie 3 3
8 Annie 4 6
9 Bill 1 4
10 Bill 2 5
11 Bill 3 8
12 Bill 4 5
I would like to add a column at to the dataframe that contains the average points of each player displayed in each row:
Name Round Points Avg.points
1 John 1 6 6.0
2 John 2 4 6.0
3 John 3 6 6.0
4 John 4 8 6.0
5 Annie 1 7 5.0
6 Annie 2 4 5.0
7 Annie 3 3 5.0
8 Annie 4 6 5.0
9 Bill 1 4 5.5
10 Bill 2 5 5.5
11 Bill 3 8 5.5
12 Bill 4 5 5.5
However, if I use group_by(Name)
I lose part of the data:
df <- df %>%
group_by(Name) %>%
summarise(Avg.points = mean(Points, na.rm = TRUE))
df
# A tibble: 3 × 2
Name Avg.points
<chr> <dbl>
1 Annie 5
2 Bill 5.5
3 John 6
If I use group_by(Name, Round, Points)
I don't get the correct average:
df <- df %>%
group_by(Name, Round, Points) %>%
summarise(Avg.points = mean(Points, na.rm = TRUE))
df
# A tibble: 12 × 4
# Groups: Name, Round [12]
Name Round Points Avg.points
<chr> <dbl> <dbl> <dbl>
1 Annie 1 7 7
2 Annie 2 4 4
3 Annie 3 3 3
4 Annie 4 6 6
5 Bill 1 4 4
6 Bill 2 5 5
7 Bill 3 8 8
8 Bill 4 5 5
9 John 1 6 6
10 John 2 4 4
11 John 3 6 6
12 John 4 8 8
Thanks to anyone who could help me out with this.