-2

Let's say I have a dataframe like this:

df=data.frame("A"=factor(c(1,2,1,4,2)), "date"=factor(c("1999","2000","1999","2001","2001")), "value"=c(10,20,30,40,50))

I need to sum the values in the column "value" if they have the same "A" and "date". So what I need is a dataframe like this:

dfnew=data.frame("A"=factor(c(1,2,1,4,2)), "date"=factor(c("1999","2000","1999","2001","2001")), "value"=c(10,20,30,40,50), "sum"=c(40,20,40,40,50))

I can do it with a loop, but it is very slow since my dataset is big. Is there any way to do it faster?

Reabo
  • 87
  • 11
  • 1
    Do a group by sum `df %>% group_by(A, date) %>% mutate(sum = sum(value))` or iin `base R` `df$sum <-with(df, ave(value, A, date, FUN = sum))` or wiith `data.table` for faster `library(data.table); setDT(df)[, sum := sum(value), .(A, date)]` – akrun Nov 27 '19 at 20:53

3 Answers3

2

Using dplyr, you can simply do:

library(dplyr)
df %>% group_by(date,A) %>% mutate(Sum = sum(value))

# A tibble: 5 x 4
# Groups:   date, A [4]
  A     date  value   Sum
  <fct> <fct> <dbl> <dbl>
1 1     1999     10    40
2 2     2000     20    20
3 1     1999     30    40
4 4     2001     40    40
5 2     2001     50    50

dc37
  • 15,840
  • 4
  • 15
  • 32
2

We can use data.table

library(data.table)
setDT(df)[, sum := sum(value), .(date, A)]

Or with base R

df$sum <- with(df, ave(value, A, date, FUN = sum))
akrun
  • 874,273
  • 37
  • 540
  • 662
2

A base R solution:

aggregate(value ~ A + date, data=df, sum)
  A date value
1 1 1999    40
2 2 2000    20
3 2 2001    50
4 4 2001    40
G5W
  • 36,531
  • 10
  • 47
  • 80