-2

The sample data is as below:

n period age
15 1991 5
20 1991 5
16 1991 15
29 1991 15
77 1991 25
44 1991 25

I use the following code to get the sum from the data grouped by period and age:

#The name of dataset is a.
a %>% group_by(period,age)%>%
      mutate(n = sum(n))

But the result is:

n period age
35 1991 5
35 1991 5
45 1991 15
45 1991 15
121 1991 25
121 1991 25

Why there is duplicate rows? It is because it sums every element in each groups?

user438383
  • 5,716
  • 8
  • 28
  • 43
doraemon
  • 439
  • 3
  • 10

1 Answers1

0

You need to use the summarize() function. mutate() adds a column without consolidating the data. Here's a reproducible example:

##Check if dplyr is installed, load if installed, install if not##
if(!require(dplyr)){
install.packages("dplyr")
}

##Creating the data##
n<-c(15,20,16,29,77,44)
period<-rep(1991, 6)
age<-c(5,5,15,15,25,25)

a<-data.frame(n=n, period=period, age=age)

##Calculation with summarize()##
a %>% group_by(period, age) %>% summarize(n= sum(n)) 
Sean McKenzie
  • 707
  • 3
  • 13