Finding total frequency for each variable

Question

I have a data frame with with a release_year column denoting the year a song was released and a play_count column denoting how many times that song was played in a given year. Here's a reproducible example:

release_year = c(1955, 1972, 1955, 2014, 1972) 
playcount = c(15, 2, 90, 6, 9) 
df = data.frame(release_year, playcount)   
df

How would I tidy up the data so that each year shows up only once and the total playcount is given for that year? For example, for 1955, I'll have 105 and for 1972 I'll have 11. I have tried the following code using tidyr:

gather(key = release_year, value = frequency, `1955`:`2014`)

but an error says the object is not found. Is there a better function than gather() that I should use here?

score 1 · Answer 1 · answered Dec 01 '17 at 05:21

1

You can try the dplyr approach:

df%>%group_by(release_year)%>%summarise(playcount=sum(playcount))

# A tibble: 3 x 2
#  release_year playcount
#          <dbl>     <dbl>
#1         1955       105
#2         1972        11
#3         2014         6

answered Dec 01 '17 at 05:21

tushaR

3,083
1
20
33

score 0 · Answer 2 · answered Dec 01 '17 at 05:40

You can just use the count function from dplyr (no need for tidyr):

library(dplyr)
count(df, release_year, wt = playcount)
#> # A tibble: 3 x 2
#>   release_year     n
#>          <dbl> <dbl>
#> 1         1955   105
#> 2         1972    11
#> 3         2014     6

Finding total frequency for each variable

2 Answers2