1

I feel I'm in the realm of the right answer but for the life of me I can't figure out how to do this. I want to take an entry on a product ledger and change the shape of the data where I have a row/case for every life month of a product transaction.

As a note, sometimes these ledger entries are reversed/refunded (and have date_from > date_thru). I have an example of a new purchase and a refund in the data frame below. Also, transaction dates are pre-cleaned to always be the first of the month in which they were made, since I only care about monthly recurring revenue (mrr) on a month-by-month basis.

Sample df:

user <- c("1001", "1002")
line_item <- c("abc123", "def456")
date_from <- as.Date(c("2015-01-01", "2015-06-01"), "%Y-%m-%d")
date_thru <- as.Date(c("2015-04-01", "2014-12-01"), "%Y-%m-%d")
mrr <- c(2.22, -4.44)
df <- cbind.data.frame(user, line_item, date_from, date_thru, mrr)

Output:

  user line_item  date_from  date_thru   mrr
1 1001    abc123 2015-01-01 2015-04-01  2.22
2 1002    def456 2015-06-01 2014-12-01 -4.44

Desired result:

user    line_item       month           mrr
1001    abc123          2015-01-01    2.22
1001    abc123          2015-02-01    2.22
1001    abc123          2015-03-01    2.22
1001    abc123          2015-04-01    2.22
1002    def456          2015-06-01    -4.44
1002    def456          2015-05-01    -4.44
1002    def456          2015-04-01    -4.44
1002    def456          2015-02-01    -4.44
1002    def456          2015-01-01    -4.44
1002    def456          2014-12-01    -4.44

How can I create a new month column by vectorizing a function like seq(date_from, date_thru, by="months"), and then joining all the resulting vectors into one df again like above?

I've been trying lapply, dplyr::mutate, and seq and can't get them to work correctly together.

Thanks in advance!

  • @Shree yes that is correct, it is an example of a reversed `line_item`. A reversed lineitem has a negative value and will have a date_thru that is before a date_from (sort of like a negative date range). – Adam Ellsworth Nov 14 '18 at 02:20
  • I think you are missing date `2015-03-01` for user `1002` in your desired output – Shree Nov 14 '18 at 02:58
  • You should redefine the `seq.Date` fucntion so it accepts positive and negative sequences gracefully. – IRTFM Nov 14 '18 at 03:31

3 Answers3

2

First define a month_seq function which produces the desired date sequence given the from and to dates. Then Map it to the date_from and date_thru columns producing a column of type list whose components are the date sequences. Then unnest the month and select the desired columns.

library(dplyr)
library(tidyr)

month_seq <- function(from, to) seq(from, to, paste(sign(to - from), "month"))

df %>%
  mutate(month = Map(month_seq, date_from, date_thru)) %>%
  unnest %>%
  select(user, line_item, month, mrr)

giving:

   user line_item      month   mrr
1  1001    abc123 2015-01-01  2.22
2  1001    abc123 2015-02-01  2.22
3  1001    abc123 2015-03-01  2.22
4  1001    abc123 2015-04-01  2.22
5  1002    def456 2015-06-01 -4.44
6  1002    def456 2015-05-01 -4.44
7  1002    def456 2015-04-01 -4.44
8  1002    def456 2015-03-01 -4.44
9  1002    def456 2015-02-01 -4.44
10 1002    def456 2015-01-01 -4.44
11 1002    def456 2014-12-01 -4.44
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
0

Here's a solution using dplyr, tidyr, and lubridate. I found the elapsed_months function on this SO post Number of months between two dates. I have modified it a bit for your case.

Also, I am assuming negative mrr indicates that date_from > date_thru.

elapsed_months <- function(end_date, start_date) {
  ed <- as.POSIXlt(end_date)
  sd <- as.POSIXlt(start_date)
  12 * (ed$year - sd$year) + (ed$mon - sd$mon) + ifelse(ed >= sd, 1, -1)
}

df %>% 
  uncount(weights = abs(elapsed_months(date_thru, date_from)), .id = "Months") %>% 
  mutate(
    Month = date_from %m+% months(sign(mrr)*(Months - 1))
  ) %>% 
  select(user, line_item, Month, mrr)

   user line_item      Month   mrr
1  1001    abc123 2015-01-01  2.22
2  1001    abc123 2015-02-01  2.22
3  1001    abc123 2015-03-01  2.22
4  1001    abc123 2015-04-01  2.22
5  1002    def456 2015-06-01 -4.44
6  1002    def456 2015-05-01 -4.44
7  1002    def456 2015-04-01 -4.44
8  1002    def456 2015-03-01 -4.44
9  1002    def456 2015-02-01 -4.44
10 1002    def456 2015-01-01 -4.44
11 1002    def456 2014-12-01 -4.44
Shree
  • 10,835
  • 1
  • 14
  • 36
0

Just to uncover the special powers of tidyr with the complete and fill functions

library(tidyverse)

df %>% 
  # turn into a long format to handle the dates as one variable
  gather(key, date, date_from:date_thru) %>% 
  # group the dataframe to apply operations separately on each user
  group_by(user) %>% 
  # now complete the dataset with the implicit dates within the range of dates
  # Note the consideration of: date_from > date_to with 'min' and 'max'
  complete(date = seq.Date(from = min(date),to = max(date),by = "month") ) %>% 
  #now fill in the missing 'mrr' 
  fill(mrr, line_item)


# A tibble: 11 x 5
# Groups:   user [2]
   user  date       line_item   mrr key      
   <fct> <date>     <fct>     <dbl> <chr>    
 1 1001  2015-01-01 abc123     2.22 date_from
 2 1001  2015-02-01 abc123     2.22 <NA>     
 3 1001  2015-03-01 abc123     2.22 <NA>     
 4 1001  2015-04-01 abc123     2.22 date_thru
 5 1002  2014-12-01 def456    -4.44 date_thru
 6 1002  2015-01-01 def456    -4.44 <NA>     
 7 1002  2015-02-01 def456    -4.44 <NA>     
 8 1002  2015-03-01 def456    -4.44 <NA>     
 9 1002  2015-04-01 def456    -4.44 <NA>     
10 1002  2015-05-01 def456    -4.44 <NA>     
11 1002  2015-06-01 def456    -4.44 date_from

If you want to remove the keyvariable add %>% select(-key) following after the end of the fill although you might want to keep it for some reason...

P.S. I really liked the clever trick in the function creation by @G. Grothendieck with the sign to handle date_from > date_to

Lefkios Paikousis
  • 462
  • 1
  • 6
  • 12