I have a dataset like so:
> head(featured_products)
Dept Class Sku Description Code Vehicle/Placement StartDate EndDate Comments(Circulation,Location,etc)
1: 430 4318 401684 ++INDV RAMEKIN WP 9CM OSM Facebook 2017-01-01 2017-01-29 Fancy Brunch Blog
2: 430 4318 401684 ++INDV RAMEKIN WP 9CM OSM Twitter 2017-01-01 2017-01-29 Fancy Brunch Blog
3: 340 3411 1672605 ++ SPHERE WILLOW 4" OP1 Editorial 2016-02-29 2016-03-27 Spruce up for Spring
4: 230 2311 2114074 ++BOX 30 ISLAND ORCHRD TLIGHTS EM Email 2016-02-17 2016-02-17 Island Orchard and Jeweled Lanterns
5: 895 8957 2118072 ++PAPASAN STL TAUPE OSM Instagram 2017-08-26 2017-10-01 by @audriestorme
6: 895 8957 2118072 ++PAPASAN STL TAUPE EM Email 2017-11-23 2017-11-23 Day 2 Black Friday AM
And another like this:
SKU ActivityDate OnlineSalesQuantity OnlineDiscountPercent InStoreSalesQuantity InStoreDiscountPercent
1: 401684 2015-12-01 150 0.00 406 2.72
2: 401684 2015-12-02 0 0.00 556 3.79
3: 401684 2015-12-03 0 0.00 723 3.44
4: 401684 2015-12-04 16 4.91 781 2.46
5: 401684 2015-12-05 17 0.00 982 3.18
6: 401684 2015-12-06 0 0.00 851 3.12
I have added a column to the 2nd df called featured
which is 1 if the product was listed in the 1st df for the given date and 0 otherwise.
Now, what I would like to do is also add the Vehicle/Placement
column to the new, combined df (when featured
== 1)...the issue here is that for different dates there could be different vehicles, or multiple...
How can I scan the date for the featured == 1
rows and compare this to df1 and then extract Vehicle/Placement
and add it to the combined df?
This also must be done efficiently because df2 is 2.85 million rows...
I am looking for something along the lines of:
# Add vehicle
if(combined$featured == 1) {
for (n in 1:nrow(featured_products)) {
for (m in 1:nrow(combined)) {
combined$vehicle <- ifelse(combined$activitydate[m] %within% interval(featured_products$startdate[n],featured_products$enddate[n]), featured_products$`vehicle/placement`, NA)
}
}
}
But this produces the error:
Warning message: In if (combined$featured == 1) { : the condition has length > 1 and only the first element will be used
I think I found an alternate solution, but it is EXTREMELY slow and will take hours to run:
# Add vehicle
for (n in 1:nrow(combined)) {
cat("Searching Featured Products...", "\n")
data <- featured_products[which(featured_products$sku == combined$sku[n]),]
cat("Matching dates...", "\n")
for (m in 1:nrow(data)) {
combined$vehicle <- ifelse(combined$activitydate %within% interval(data$startdate[m],data$enddate[m]), data$`vehicle/placement`, NA)
}
}