0

The dataset is available on Kaggle: https://www.kaggle.com/heesoo37/120-years-of-olympic-history-athletes-and-results/

I need to use dplyr to create an object which contains, for each combination of Sex and Season, the number of different sports in the data set.

I first group the dataset with sex, season and sport, summarized them, which gives me a table with too many rows, with column Sex, Season and Sport. This is not right. Then I used n() in the summarize function, which returned me the same result, only with one more column: number of people

final_group<- group_by(dataset, Sex, Season)
final_group_1 <- summarise(final_group)

then i tried:

final_group<- group_by(dataset, Sex, Season)
final_group_1 <- summarise(final_group, n())

both did not return what I want.

I only want 4 rows with the sums of all sports played in the summer or winter by each gender, like the example shown below:

Sex       Season        Num_sports
Male      summer           ( all sports played by males in the summer )
Male      winter           ( all sports played by males in the winter )
Female    summer           ( all sports played by females in the summer )
Female    winter           ( all sports played by females in the winter )
  • Can you post sample data? Please edit **the question** with the output of `dput(dataset)`. Or, if it is too big with the output of `dput(head(dataset, 20))`. – Rui Barradas May 06 '19 at 15:53
  • 1
    It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and the desired output for that input that can be used to test and verify possible solutions. – MrFlick May 06 '19 at 15:53
  • `dplyr::filter()` + `table()` – bbiasi May 06 '19 at 15:57

1 Answers1

1

Without a reprex it's impossible to know exactly how your data looks but something like this should give you a desired answer:

library(tidyverse)

dat <- read_csv("~/Desktop/athlete_events.csv")

dat %>% 
    count(Sex, Season)
#> # A tibble: 4 x 3
#>   Sex   Season      n
#>   <chr> <chr>   <int>
#> 1 F     Summer  59443
#> 2 F     Winter  15079
#> 3 M     Summer 163109
#> 4 M     Winter  33485

If you want number of unique sports played by Men/Women for each season you can do something like this:

dat %>% 
    group_by(Sex, Season) %>% 
    summarise(num_sports_played = length(unique(Sport)))
#> # A tibble: 4 x 3
#> # Groups:   Sex [2]
#>   Sex   Season num_sports_played
#>   <chr> <chr>              <int>
#> 1 F     Summer                40
#> 2 F     Winter                14
#> 3 M     Summer                49
#> 4 M     Winter                17

Created on 2019-05-06 by the reprex package (v0.2.1)

dylanjm
  • 2,011
  • 9
  • 21
  • Thank you, I just updated the question, could you please take a look? I tried the code you provided, but it returns the same result that I used with the second part of my code. It returns the number of people who played the sport, instead of the sums of the sport. –  May 06 '19 at 19:14
  • @VanLindert please accept my answer so others may know your question was answered to satisfaction. Thanks. – dylanjm May 06 '19 at 20:01
  • Hi Dylan, thanks for your reminder! I have one more question, just for my curiosity: is it also possible to do the same with base R? I tried but seems like base R don't really have a descent function for grouping. –  May 07 '19 at 10:07