0

I have a problem with some sales data where my date-variable have irregular "jumps", because there are times where a product did not sell in some of the months during a year.

For example:

Data        Product     Sales  
01-2016     X           10
02-2016     X           20
06-2016     X           30
01-2016     Y           40
07-2016     Y           50

How do I transform this data, so it combines all dates with all products?

Data        Product     Sales  
01-2016     X           10
02-2016     X           20
03-2016     X           0
04-2016     X           0
05-2016     X           0
06-2016     X           30
07-2016     X           0
08-2016     X           0
09-2016     X           0
10-2016     X           0
11-2016     X           0
12-2016     X           0
01-2016     Y           40
02-2016     Y           0
03-2016     Y           0
04-2016     Y           0
05-2016     Y           0
06-2016     Y           0
07-2016     Y           50
08-2016     Y           0
09-2016     Y           0
10-2016     Y           0
11-2016     Y           0
12-2016     Y           0

I.e. inserting the missing observations and assigning a zero to the Sales-variable?

Hbrandi
  • 171
  • 2
  • 14

2 Answers2

1

One option is to use the package padr with the function pad_cust for a customized padding of the data. Function pad would only fill in the missing dates from the available dates.

First we need to create a date object, then it is just a question of using the correct functions.

library(padr)

# create date object
df$Data <- as.Date(paste("01-", df$Data, sep = ""), "%d-%m-%Y")

#pad data with custom span option and fill na's with 0
df <- pad_cust(df, span_date("2016", "2017", by = "month"), group = "Product")
df <- fill_by_value(df, value = 0)

df
         Data Product Sales
1  2016-01-01       X    10
2  2016-02-01       X    20
3  2016-03-01       X     0
4  2016-04-01       X     0
5  2016-05-01       X     0
6  2016-06-01       X    30
7  2016-07-01       X     0
8  2016-08-01       X     0
9  2016-09-01       X     0
10 2016-10-01       X     0
11 2016-11-01       X     0
12 2016-12-01       X     0
13 2016-01-01       Y    40
14 2016-02-01       Y     0
15 2016-03-01       Y     0
16 2016-04-01       Y     0
17 2016-05-01       Y     0
18 2016-06-01       Y     0
19 2016-07-01       Y    50
20 2016-08-01       Y     0
21 2016-09-01       Y     0
22 2016-10-01       Y     0
23 2016-11-01       Y     0
24 2016-12-01       Y     0

padr can also work magrittr style if needed, see the vignette's of padr.

phiver
  • 23,048
  • 14
  • 44
  • 56
  • Can you help me on how to do this if I have more than one group in the pad_cust(df, span_date("2016", "2017", by = "month"), group = "Product") – Hbrandi May 04 '18 at 06:38
  • group can hold a vector, so group = c("Product", "Product1", etc). – phiver May 04 '18 at 08:49
0

In base R (the date format makes it a bit cumbersome):

max_date <- as.Date("2016-12-01") # must input this (no way of knowing)

d <- read.table(header = T, stringsAsFactors = F, text = "Data            Product     Sales  
01-2016     X           10
02-2016     X           20
06-2016     X           30
01-2016     Y           40
07-2016     Y           50")

min_dates <- as.Date(tapply(as.Date(paste0("01-",d$Data), format = "%d-%m-%Y"),
                            d$Product, min), origin = "1970-01-01")
dates <- lapply(min_dates, function(x) seq(from = x, to = max_date, by = "1 month"))
want <- data.frame("Data" = format(as.Date(unlist(dates), origin = "1970-01-01"), format = "%m-%Y"),
                   "Product" = rep(unique(d$Product), sapply(dates, length)))
want <- merge(want, d, all.x = T, on = c("Data", "Product"))
want$Sales[is.na(want$Sales)] <- 0
want[order(want$Product, want$Data), ]
r.user.05apr
  • 5,356
  • 3
  • 22
  • 39