-1

I have a data frame with with a release_year column denoting the year a song was released and a play_count column denoting how many times that song was played in a given year. Here's a reproducible example:

release_year = c(1955, 1972, 1955, 2014, 1972) 
playcount = c(15, 2, 90, 6, 9) 
df = data.frame(release_year, playcount)   
df

How would I tidy up the data so that each year shows up only once and the total playcount is given for that year? For example, for 1955, I'll have 105 and for 1972 I'll have 11. I have tried the following code using tidyr:

gather(key = release_year, value = frequency, `1955`:`2014`)

but an error says the object is not found. Is there a better function than gather() that I should use here?

Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
Jin Yu Li
  • 103
  • 3
  • 12

2 Answers2

1

You can try the dplyr approach:

df%>%group_by(release_year)%>%summarise(playcount=sum(playcount))

# A tibble: 3 x 2
#  release_year playcount
#          <dbl>     <dbl>
#1         1955       105
#2         1972        11
#3         2014         6
tushaR
  • 3,083
  • 1
  • 20
  • 33
0

You can just use the count function from dplyr (no need for tidyr):

library(dplyr)
count(df, release_year, wt = playcount)
#> # A tibble: 3 x 2
#>   release_year     n
#>          <dbl> <dbl>
#> 1         1955   105
#> 2         1972    11
#> 3         2014     6
markdly
  • 4,394
  • 2
  • 19
  • 27