0

I have a following problem. My dataset contains information about tennis players and number of games they played each season during their active career.

Name Season Games
Nadal 2015 84
Novak 2017 14
Nadal 2016 88
Federer 2018 75
Nadal 2010 45
.
.
.

I would like to create a new dataset that includes only players that were playing for five and more years.

I suppose, I have to somehow sum up players together and then filter them. How can I do it, please?

rama27
  • 115
  • 1
  • 1
  • 6
  • 1
    What have you tried so far and have you looked at any questions that have already been posted? You may find this thread helpful: https://stackoverflow.com/questions/9809166/count-number-of-rows-within-each-group – Andrew Dec 06 '19 at 18:57

1 Answers1

0

Using dplyr you can count and filter your dataframe. For example, I create this dummy dataframe:

df = data.frame(P = c("A","A","A","A","A","A","A","B","B","C","C","C","C"),
                y = c(1,4,5,8,7,4,2,3,4,8,7,4,1))


library(dplyr)
df %>% group_by(P) %>% add_count(P) %>% filter(n > 5)
# A tibble: 7 x 3
# Groups:   P [1]
  P         y     n
  <fct> <dbl> <int>
1 A         1     7
2 A         4     7
3 A         5     7
4 A         8     7
5 A         7     7
6 A         4     7
7 A         2     7

With your dataframe, you can try:

df %>% group_by(Name) %>% add_count(Name) %>% filter(n >= 5)
dc37
  • 15,840
  • 4
  • 15
  • 32