0

Assuming I have a data table (not data frame) that has columns of Year, Month, Day, Sales, and a new column Sales Last Year (Similar to the picture below), how do I create that Sales Last Year column to grab the Sales from the same day and month but one year ago?

Data

dt <- data.table(Month = rep("January", times = 3),
                 Year = 2016:2018,
                 Day = rep(1, times = 3),
                 Sales = c(5000, 1000, 2000))

Expected result enter image description here

jazzurro
  • 23,179
  • 35
  • 66
  • 76
Fiboniz
  • 37
  • 6
  • Welcome to SO! I've provided an answer below, but I wanted to also provide you with a little guidance for the next question you post. The SO community has a few rules [rules](https://stackoverflow.com/help/on-topic) and [norms](https://stackoverflow.com/help/how-to-ask), and following them will help you get a good answer to your question. – DanY Aug 10 '18 at 23:49
  • In addition, you might want to review [this page](https://stackoverflow.com/help/mcve) for how to provide an MCVE (a minimum, complete, and verifiable example). There's also [this page](https://stackoverflow.com/a/5963610/4573108) that offers guidance for R-specific MCVEs. Finally, it's best to avoid using images of code/data and [here's why](https://meta.stackoverflow.com/questions/285551/why-not-upload-images-of-code-on-so-when-asking-a-question/285557#285557). – DanY Aug 10 '18 at 23:49

1 Answers1

1

Generate fake data to work with:

library(data.table)
dt <- data.table(
    day   = rep(1,3),
    month = rep(1, 3),
    year  = 2016:2018,
    sales = 1:3*500
)

A "merge" approach:

dtcopy <- copy(dt)
dtcopy[ , year := year + 1]
setnames(dtcopy, old="sales", new="sales_last_year")
merge(dt, dtcopy, by=c("month", "day", "year"), all.x=TRUE)

A "lag" approach:

dt <- dt[order(day, month, year)]
dt[ , sales_last_year := shift(sales), by=.(day, month)]
DanY
  • 5,920
  • 1
  • 13
  • 33
  • 1
    The "lag" approach will fail if there are any gaps in the sequence of years by day and month. "merge" is much safer. – Uwe Aug 11 '18 at 06:10
  • 1
    An _update join_ will also do `dt[dt[, .(day, month, year = year + 1, sales)], on = c("month", "day", "year"), sales_last_year := i.sales][]` – Uwe Aug 11 '18 at 06:19
  • The data.table folks need to re-publish their "join" vignette. I'm always hesitant to use the `dt1[dt2]` style because I don't have a good enough sense of what exactly is happening. For example, when can I use `x.sales` versus `i.sales` (maybe these are the same, but I can't find documentation that says so!)? – DanY Aug 11 '18 at 18:31