0

I'm trying to find the Start and End date of a product's sales period from a column of data that is a dummy variable for sale. Here is a proxy of the type of data that I am working with: enter image description here

The result I am looking for is:

enter image description here

The actual data set I am working on is much larger than this and does not necessarily look at just 2010-01 to 2011-12.

Thank you!

Community
  • 1
  • 1
Al. So
  • 53
  • 5
  • Seems like get min and max with categories of a variable would have been asked many time before. Have you been unable to find prior questions like this? At any rate you DO need to post an example in code. – IRTFM Feb 08 '19 at 20:39

1 Answers1

0

This assumes only one sale per product

require(tidyverse)

df <- data.frame(product = 'Product A', 
                 month = seq(as.Date('2010-01-01'),
                             as.Date('2010-10-01'),
                             by = 'month'
                             ),
                 onSale = c(rep(0,3), rep(1,4),rep(0,3))
                 )


df %>% 
  group_by(product) %>% 
  summarise(saleStart = month[which.min(month[onSale == 1])],
            salend    = month[which.max(month[onSale == 1])]
            )

Edit:

df <- data.frame(product = 'Product A', 
                 month = seq(as.Date('2010-01-01'),
                             as.Date('2011-09-01'),
                             by = 'month'
                             ),
                 onSale = c(rep(0,3), rep(1,4),rep(0,3), rep(1,4),rep(0,3), rep(1,4))
                 )


df %>% 
  group_by(product) %>% 
  mutate(diff = c(0,diff(onSale))) %>% 
  group_by(product, diff) %>% 
  filter(diff == 1) %>% 
  mutate(monthStart = month, monthEnd = month  %m+% months(1)) %>% 
  select(-month,-diff)
mr.joshuagordon
  • 754
  • 4
  • 8