2

I have a dataset products like so:

> head(featured_products)
   Dept Class     Sku                    Description Code Vehicle/Placement  StartDate    EndDate  Comments(Circulation,Location,etc)
1:  430  4318  401684          ++INDV RAMEKIN WP 9CM  OSM          Facebook 2017-01-01 2017-01-29                   Fancy Brunch Blog
2:  430  4318  401684          ++INDV RAMEKIN WP 9CM  OSM           Twitter 2017-01-01 2017-01-29                   Fancy Brunch Blog
3:  340  3411 1672605            ++ SPHERE WILLOW 4"  OP1         Editorial 2016-02-29 2016-03-27                Spruce up for Spring
4:  230  2311 2114074 ++BOX 30 ISLAND ORCHRD TLIGHTS   EM             Email 2016-02-17 2016-02-17 Island Orchard and Jeweled Lanterns
5:  895  8957 2118072            ++PAPASAN STL TAUPE  OSM         Instagram 2017-08-26 2017-10-01                    by @audriestorme
6:  895  8957 2118072            ++PAPASAN STL TAUPE   EM             Email 2017-11-23 2017-11-23               Day 2 Black Friday AM

and another dataset sales like so:

      SKU ActivityDate OnlineSalesQuantity OnlineDiscountPercent InStoreSalesQuantity InStoreDiscountPercent
1: 401684   2015-12-01                 150                  0.00                  406                   2.72
2: 401684   2015-12-02                   0                  0.00                  556                   3.79
3: 401684   2015-12-03                   0                  0.00                  723                   3.44
4: 401684   2015-12-04                  16                  4.91                  781                   2.46
5: 401684   2015-12-05                  17                  0.00                  982                   3.18
6: 401684   2015-12-06                   0                  0.00                  851                   3.12

Now...how can I create a flag column in the sales dataset entitled "featured"? This should be 1 if the ActivityDate falls between the times listed in products(StartDate, EndDate) and 0 otherwise...

I have tried several suggested posts on creating intervals out of POSIXct times but none of them really seem to suit my needs.

Suggestions would be very nice. Thank you.

zsad512
  • 861
  • 3
  • 15
  • 41
  • Possible duplicate of [Check if a date is within an interval in R](https://stackoverflow.com/questions/41497351/check-if-a-date-is-within-an-interval-in-r) – Maurits Evers Jan 22 '18 at 00:08
  • ^ didnt work for me – zsad512 Jan 22 '18 at 00:26
  • if the values given are of class date, use `as.numeric(mapply(dplyr::between,sales$activitydate,product$startdate,product$enddata))` otherwise transform the class to date then use this code – Onyambu Jan 22 '18 at 00:41

2 Answers2

2

This can be solved using a non-equi join:

library(data.table)
setDT(sales)[, featured := 0][setDT(featured_products), 
             on = .(SKU, ActivityDate >= StartDate, ActivityDate <= EndDate), 
             featured := 1][]
       SKU ActivityDate featured
1:  401684   2017-01-01        1
2:  401684   2016-03-15        0
3: 1672605   2016-03-22        1
4: 1672605   2017-01-15        0

Make sure that all columns involved in the non-equi join, i.e., ActivityDate, StartDate, and EndDate, are of the same type/class, either POSIXct or Date or IDate, preferably Date if time of day is not relevant.

Minimal reproducible datasets

featured_products <- data.frame(
  SKU = c(401684, 1672605), 
  StartDate = as.POSIXct(c("2017-01-01", "2016-02-29")), 
  EndDate = as.POSIXct(c("2017-01-29", "2016-03-27")))
sales <- data.frame(
  SKU = c(401684, 401684, 1672605, 1672605), 
  ActivityDate = as.POSIXct(c("2017-01-01", "2016-03-15", "2016-03-22", "2017-01-15")))

Note that the dates are of class POSIXct as requested by the OP.

Community
  • 1
  • 1
Uwe
  • 41,420
  • 11
  • 90
  • 134
  • I tried this, but upon inspection- it doesnt actually work for me...the featured column is created but its simply filled with 0's – zsad512 Jan 22 '18 at 14:43
  • For some reason `featured_products` has POSIXct dates but then I had to convert the sales$activitydate using `as.Date`...so now they are not compatible? What should I do? – zsad512 Jan 22 '18 at 14:45
  • 1
    Please, make sure that all columns involved in the *non-equi join*, ie., `ActivityDate`, `StartDate`, `EndDate`, are of the same type/class, either `POSIXct` or `Date` or `IDate`, preferably `Date` if time is not relevant. – Uwe Jan 22 '18 at 14:49
0

Based on a minimal example:

library(lubridate)
library(plyr)

featured_products <- data.frame(SKU=c(401684,1672605), StartDate=c("2017-01-01",  "2016-02-29"), EndDate=c("2017-01-29",  "2016-03-27"))
sales <- data.frame(SKU=c(401684,401684, 1672605), ActivityDate=c("2017-01-01", "2016-01-01", "2016-03-22"))

output <- plyr::join(sales, featured_products, by="SKU")

output$ActivityDate <- ymd(output$ActivityDate)
output$StartDate <- ymd(output$StartDate)
output$EndDate <- ymd(output$EndDate)

output$featured <- ifelse(output$ActivityDate>=output$StartDate & output$ActivityDate<=output$EndDate,1,0)

It gives

      SKU ActivityDate  StartDate    EndDate featured
1  401684   2017-01-01 2017-01-01 2017-01-29        1
2  401684   2016-01-01 2017-01-01 2017-01-29        0
3 1672605   2016-03-22 2016-02-29 2016-03-27        1
DJack
  • 4,850
  • 3
  • 21
  • 45