0

I'm trying to create a new variable which equals the latest month's value minus the previous month's (or 3 months prior, etc.).

A quick df:

country <- c("XYZ", "XYZ", "XYZ")
my_dates <- c("2021-10-01", "2021-09-01", "2021-08-01")
var1 <- c(1, 2, 3)
df1 <- country %>% cbind(my_dates) %>% cbind(var1) %>% as.data.frame()
df1$my_dates <- as.Date(df1$my_dates)
df1$var1 <- as.numeric(df1$var1)

For example, I've tried (partially from: How to subtract months from a date in R?)

library(tidyverse)
df2 <- df1 %>%
 mutate(dif_1month = var1[my_dates==max(my_dates)] -var1[my_dates==max(my_dates) %m-% months(1)]

I've also tried different variations of using lag():

df2 <- df1 %>%
 mutate(dif_1month = var1[my_dates==max(my_dates)] - var1[my_dates==max(my_dates)-lag(max(my_dates), n=1L)])

Any suggestions on how to grab the value of a variable when dates equal the second latest observation?

Thanks for help, and apologies for not including any data. Can edit if necessary.

Edited with a few potential answers:

#this gives me the value of var1 of the latest date

df2 <- df1 %>% 
  mutate(value_1month = var1[my_dates==max(my_dates)])

#this gives me the date of the second latest date

df2 <- df1 %>% 
  mutate(month1 = max(my_dates) %m-%months(1))

#This gives me the second to latest value

df2 <- df1 %>% 
  mutate(var1_1month = var1[my_dates==max(my_dates) %m-%months(1)])

#This gives me the difference of the latest value and the second to last of var1

df2 <- df1 %>% 
  mutate(diff_1month = var1[my_dates==max(my_dates)] - var1[my_dates==max(my_dates) %m-%months(1)])
hraw45
  • 3
  • 3

1 Answers1

1

mutate requires the output to be of the same length as the number of rows of the original data. When we do the subsetting, the length is different. We may need ifelse or case_when

library(dplyr)
library(lubridate)
df1 %>%
    mutate(diff_1month = case_when(my_dates==max(my_dates) ~ 
           my_dates %m-% months(1)))

NOTE: Without a reproducible example, it is not clear about the column types and values


Based on the OP's update, we may do an arrange first, grab the last two 'val' and get the difference

df1 %>% 
  arrange(my_dates) %>%
  mutate(dif_1month = diff(tail(var1, 2)))
    .   my_dates var1 dif_1month
1 XYZ 2021-08-01    3         -1
2 XYZ 2021-09-01    2         -1
3 XYZ 2021-10-01    1         -1
akrun
  • 874,273
  • 37
  • 540
  • 662