sum numeric variable six months ago

Question

i have a data base with dates and numeric variables. I also have multiple rows per id. It looks like this:

ID	date	x
1	2019-01-01	3
1	2018-12-01	4
1	2017-11-01	1
1	2017-10-01	2
1	2017-09-01	2
1	2017-08-01	2

I need to sum x up to six month ago from date, so i tried this

library(lubridate)
    mutate(semester= semester(fecha_inicio,with_year = TRUE)) %>%
  group_by(ID,semester) %>%
  mutate(sum_semester = sum(x, na.rm = TRUE))

but is not what i need because 2019-01-01 have 3 instead of 14.

Please help.

What are you piping to `mutate`? It's the data, yes, but it's missing from the code. — Rui Barradas, Jan 07 '22 at 18:17

score 0 · Answer 1 · answered Jan 07 '22 at 19:12

0

I found the answer here Cumulative sum from a month ago until the current day for all the rows adapting the code:

library(tidyverse)
library(lubridate)
data <- data %>%
  group_by(ID) %>%
  mutate(sum_6m = map_dbl(1:n(), ~ sum(x[(date>= (date[.] - months(5))) &
                                                   (date<= date[.])], na.rm = TRUE)))

answered Jan 07 '22 at 19:12

JOSE ANTONIO RUIZ TAGLE MATURA

21
2

I think your code is not right. – huckfinn Jan 07 '22 at 20:49

huckfinn · Answer 2 · 2022-01-07T22:13:25.980

Using the classic way to aggregate datasets x ~ id by a function like sum with a normal data.frame filter it could be the following code.

Code

library(lubridate)

# Define your data
data <- 
"id date x
1 2019-01-01 3
1 2018-12-01 4
1 2017-11-01 1
1 2017-10-01 2
1 2018-02-12 2
1 2017-09-01 2
"
# Read the table 
tab <- read.csv(text=data, header = T, sep=' ')

# Find the youngest date
top.date <- as.Date(max(tab$date))

# Calculate the threshold (before and after point) of 6 month
thresh   <- top.date) %m-% months(6)

# Calculate the sums over the ID's after the point date
after.thresh  <- aggregate(x ~ id, 
                           data = tab[as.Date(tab$date) >= thresh,], 
                           FUN  = sum)

# Calculate the sums over the ID's before the point date
before.thresh <- aggregate(x ~ id, 
                           data = tab[as.Date(tab$date) < thresh,], 
                           FUN=sum)

# Print the dates 
cat("TOP.DATE.IS:", format_ISO8601(top.date),
    " THRESH.DATE.IS:", format_ISO8601(thresh),"\n")

# Print the sums 
cat("SUM.BEFORE.THRESH:", after.thresh$x, 
    "SUM.AFTER.THRESH:", before.thresh$x,"\n")

Results

TOP.DATE.IS: 2019-01-01  THRESH.DATE.IS: 2018-07-01 
SUM.BEFORE.THRESH: 7 SUM.AFTER.THRESH: 7

sum numeric variable six months ago

2 Answers2

Code

Results