1

Excel function I want to replicate in R

https://i.stack.imgur.com/MHuXO.png

In R I'm looking to create a row calculation that averages prices based on Product type and a date range based on Order_Date. In the example from Excel, I am averaging Price (column B) of Product B (row 4) based on other Product Bs that were ordered within a 30 day period of row 4's Order_Date (column C), meaning I want to average all Product B price that occurred within +/- 15 days of row 4's order. Ultimately, I need to find the squared difference between Price and average price (column E).

This post is similar to my question, however, I'm struggling to draw parallels between mine and the one asked here: Excel's AVERAGEIFS() in R

Here's what I tried in R:

dfdata <- data.frame(RSTUDIO_AVERAGEIFS)

dfdata$Avg_Price <- with(dfdata, mean(dfdata[
  dfdata$Product == dfdata$Product &
    dfdata$Order_Date >= (as.Date(dfdata$Order_Date) - days(15)) &
    dfdata$Order_Date <= (as.Date(dfdata$Order_Date) + days(15)),
  "Price"
]))

When I look at the calculated Avg_Price it is one number for each row. As a check for myself, column Mean and Avg_Price should be equal to each other.

Mean and Avg_Price column should be equal

(https://i.stack.imgur.com/FTc7E.png)

Here's my dataset:

structure(list(Product = c("A", "B", "B", "F", "D", "F", "A", 
"F", "D", "A", "A", "D", "C", "C", "A", "B", "C", "A", "B", "A", 
"A", "E", "D", "F", "B", "B", "E", "F", "F", "E", "F", "A", "A", 
"D", "F", "C", "C", "C", "A", "D", "D", "E", "D", "B", "C", "B", 
"D", "F", "C", "A", "A", "F", "D", "E", "B", "B", "A", "E", "A", 
"D", "E", "C", "C", "C", "E", "D", "F", "E", "B", "E", "D", "B", 
"A", "B", "D", "F", "C", "C", "E", "A", "A", "F", "D", "D", "A", 
"F", "C", "A", "A", "F", "A", "B", "A", "D", "C", "C", "A", "B", 
"E", "B", "D", "B", "F", "F", "B", "C", "B", "B", "C", "F", "A", 
"B", "A", "E", "C", "E", "E", "E", "D", "B", "C", "D", "B", "C", 
"F", "F", "C", "F", "D", "F", "A", "D", "B", "B", "C", "E", "B"
), Price = c(6.502, 4.526, 2.272, 9.097, 0.594, 5.41, 9.85, 1.197, 
5.09, 7.343, 3.339, 1.107, 7.993, 7.922, 4.558, 4.75, 1.278, 
9.55, 8.223, 6.195, 0.668, 9.741, 9.679, 3.488, 4.159, 3.756, 
6.233, 7.658, 8.896, 9.724, 6.582, 7.422, 1.172, 2.734, 4.917, 
0.784, 9.284, 0.7, 6.869, 3.054, 9.945, 4.173, 5.217, 6.016, 
5.559, 5.247, 7.024, 8.845, 8.436, 7.482, 8.609, 3.675, 2.76, 
5.357, 4.125, 3.199, 7.736, 5.255, 5.581, 5.282, 2.753, 1.568, 
2.083, 8.938, 7.562, 7.513, 9.493, 8.404, 5.266, 9.992, 3.813, 
5.522, 6.295, 5.385, 1.91, 3.597, 4.105, 9.484, 6.697, 4.818, 
1.644, 2.699, 0.608, 9.6, 0.447, 4.123, 2.997, 5.085, 0.903, 
5.455, 1.869, 4.053, 8.843, 1.171, 8.491, 9.236, 5.642, 6.565, 
3.168, 4.367, 6.008, 6.267, 8.363, 0.318, 5.226, 6.597, 2.932, 
3.149, 8.578, 1.814, 9.288, 9.96, 8.44, 3.514, 2.832, 5.881, 
4.57, 7.646, 2.19, 5.446, 5.318, 3.674, 6.235, 9.414, 7.201, 
9.846, 0.824, 0.757, 4.928, 0.499, 9.165, 1.564, 6.944, 3.474, 
8.537, 5.412, 7.142), Order_Date = structure(c(19208, 19210, 
19337, 19288, 19339, 19318, 19352, 19290, 19245, 19317, 19352, 
19177, 19306, 19300, 19196, 19314, 19270, 19212, 19232, 19199, 
19238, 19274, 19265, 19311, 19257, 19344, 19294, 19331, 19181, 
19348, 19317, 19332, 19334, 19338, 19184, 19274, 19240, 19350, 
19195, 19306, 19198, 19283, 19260, 19263, 19177, 19197, 19186, 
19205, 19264, 19328, 19268, 19190, 19210, 19300, 19273, 19188, 
19280, 19264, 19211, 19241, 19296, 19350, 19260, 19201, 19254, 
19262, 19176, 19175, 19335, 19179, 19196, 19190, 19253, 19178, 
19190, 19234, 19253, 19310, 19323, 19325, 19258, 19292, 19213, 
19269, 19225, 19299, 19350, 19224, 19193, 19250, 19242, 19260, 
19340, 19276, 19211, 19222, 19305, 19305, 19273, 19268, 19306, 
19277, 19225, 19188, 19190, 19181, 19268, 19291, 19272, 19238, 
19184, 19228, 19199, 19237, 19205, 19251, 19357, 19238, 19288, 
19301, 19331, 19260, 19320, 19175, 19253, 19312, 19354, 19357, 
19176, 19187, 19225, 19281, 19352, 19315, 19246, 19308, 19356
), class = "Date"), Mean = c(6.075, 4.887, 4.56, 4.279, 1.664, 
6.597, 7.344, 4.279, 4.816, 6.541, 7.344, 3.742, 8.466, 8.466, 
6.042, 5.294, 4.232, 6.371, 9.092, 6.432, 3.922, 5.584, 6.142, 
5.89, 4.305, 5.076, 4.786, 6.55, 4.633, 7.147, 6.597, 6.18, 5.947, 
1.664, 4.633, 4.232, 7.309, 1.522, 6.108, 4.531, 4.343, 5.214, 
6.796, 4.56, 7.19, 4.744, 4.788, 6.26, 4.211, 6.18, 6.071, 5.235, 
4.282, 4.939, 4.627, 4.916, 8.173, 6.321, 6.169, 5.186, 4.786, 
1.522, 4.829, 6.754, 6.233, 6.142, 4.633, 9.198, 4.382, 9.198, 
5.09, 4.916, 3.817, 4.833, 4.788, 4.591, 6.489, 8.466, 6.055, 
6.18, 5.516, 4.279, 4.438, 5.488, 5.083, 5.075, 1.522, 5.083, 
6.108, 4.823, 2.944, 4.275, 6.132, 5.286, 7.374, 8.864, 6.493, 
4.937, 5.584, 4.56, 4.531, 4.476, 4.591, 4.633, 4.916, 7.19, 
4.56, 5.357, 4.232, 5.286, 6.042, 9.092, 6.432, 5.68, 6.754, 
5.972, 7.147, 5.68, 1.642, 4.677, 5.318, 6.796, 5.258, 7.19, 
4.823, 5.89, 1.522, 0.757, 3.742, 4.633, 5.083, 3.631, 5.029, 
5.294, 6.002, 5.29, 5.947), Price_Less_Mean = c(0.427, -0.361, 
-2.288, 4.818, -1.07, -1.187, 2.506, -3.082, 0.274, 0.802, -4.005, 
-2.635, -0.472999999999999, -0.544, -1.484, -0.544, -2.954, 3.179, 
-0.869, -0.237, -3.254, 4.157, 3.537, -2.402, -0.146, -1.32, 
1.447, 1.108, 4.263, 2.577, -0.0150000000000006, 1.242, -4.775, 
1.07, 0.284, -3.448, 1.975, -0.822, 0.761, -1.477, 5.602, -1.041, 
-1.579, 1.456, -1.631, 0.503, 2.236, 2.585, 4.225, 1.302, 2.538, 
-1.56, -1.522, 0.418, -0.502, -1.717, -0.437, -1.066, -0.587999999999999, 
0.0960000000000001, -2.033, 0.046, -2.746, 2.184, 1.329, 1.371, 
4.86, -0.794, 0.884, 0.794, -1.277, 0.606, 2.478, 0.552, -2.878, 
-0.994, -2.384, 1.018, 0.642, -1.362, -3.872, -1.58, -3.83, 4.112, 
-4.636, -0.952, 1.475, 0.00199999999999978, -5.205, 0.632, -1.075, 
-0.222, 2.711, -4.115, 1.117, 0.372, -0.851, 1.628, -2.416, -0.193, 
1.477, 1.791, 3.772, -4.315, 0.31, -0.593, -1.628, -2.208, 4.346, 
-3.472, 3.246, 0.868, 2.008, -2.166, -3.922, -0.0910000000000002, 
-2.577, 1.966, 0.548, 0.769, 0, -3.122, 0.977, 2.224, 2.378, 
3.956, -0.698, 0, 1.186, -4.134, 4.082, -2.067, 1.915, -1.82, 
2.535, 0.122, 1.195), Avg_Price = c(5.372, 5.372, 5.372, 5.372, 
5.372, 5.372, 5.372, 5.372, 5.372, 5.372, 5.372, 5.372, 5.372, 
5.372, 5.372, 5.372, 5.372, 5.372, 5.372, 5.372, 5.372, 5.372, 
5.372, 5.372, 5.372, 5.372, 5.372, 5.372, 5.372, 5.372, 5.372, 
5.372, 5.372, 5.372, 5.372, 5.372, 5.372, 5.372, 5.372, 5.372, 
5.372, 5.372, 5.372, 5.372, 5.372, 5.372, 5.372, 5.372, 5.372, 
5.372, 5.372, 5.372, 5.372, 5.372, 5.372, 5.372, 5.372, 5.372, 
5.372, 5.372, 5.372, 5.372, 5.372, 5.372, 5.372, 5.372, 5.372, 
5.372, 5.372, 5.372, 5.372, 5.372, 5.372, 5.372, 5.372, 5.372, 
5.372, 5.372, 5.372, 5.372, 5.372, 5.372, 5.372, 5.372, 5.372, 
5.372, 5.372, 5.372, 5.372, 5.372, 5.372, 5.372, 5.372, 5.372, 
5.372, 5.372, 5.372, 5.372, 5.372, 5.372, 5.372, 5.372, 5.372, 
5.372, 5.372, 5.372, 5.372, 5.372, 5.372, 5.372, 5.372, 5.372, 
5.372, 5.372, 5.372, 5.372, 5.372, 5.372, 5.372, 5.372, 5.372, 
5.372, 5.372, 5.372, 5.372, 5.372, 5.372, 5.372, 5.372, 5.372, 
5.372, 5.372, 5.372, 5.372, 5.372, 5.372, 5.372)), row.names = c(NA, 
-137L), class = "data.frame")
Nick
  • 15
  • 4
  • 3
    Would be easier for someone to help if you post some reproducible data. You can use `dput(dfdata)`. – nniloc Apr 14 '23 at 20:52
  • 1
    It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. Please [do not post code or data in images](https://meta.stackoverflow.com/q/285551/2372064) – MrFlick Apr 14 '23 at 21:12

2 Answers2

0
library(dplyr)
library(slider)
library(lubridate)

df %>%
  arrange(Order_Date) %>%
  mutate(
    Mean = slide_index_dbl(
      Price, Order_Date, mean, .before = days(15), .after = days(15)
    ),
    .by = Product
  )
GegznaV
  • 4,938
  • 4
  • 23
  • 43
Jon Spring
  • 55,165
  • 4
  • 35
  • 53
  • Thank you for your response! The language produces the results I expect. A final question, how do I add the 'Mean' output to the data frame? So, when I run View(dfdata) it shows the calculated 'Mean' column. – Nick Apr 14 '23 at 22:39
  • 1
    Change it's name to e.g. `Mean2` and then you can compare to Mean and see they match (at least to a bunch of decimal places). And assign the output back to the data frame. So `dfdata <- dfdata %>% ` will update the `dfdata` object to include the new column. – Jon Spring Apr 14 '23 at 23:00
0

You can create an averageifs function as below, and apply it rowwise

library(dplyr)
averageifs <- \(d, p, df, n=15) {
  filter(df, between(Order_Date, d-n, d+n), Product==p) %>% summarize(mean(Price)) %>% pull()
}

dfdata %>% rowwise() %>% mutate(meanprice = averageifs(Order_Date,Product, dfdata))

Output:

   Product Price Order_Date meanprice
   <chr>   <dbl> <date>         <dbl>
 1 A       6.50  2022-08-04      6.07
 2 B       4.53  2022-08-06      4.89
 3 B       2.27  2022-12-11      4.56
 4 F       9.10  2022-10-23      4.28
 5 D       0.594 2022-12-13      1.66
 6 F       5.41  2022-11-22      6.60
 7 A       9.85  2022-12-26      7.34
 8 F       1.20  2022-10-25      4.28
 9 D       5.09  2022-09-10      4.82
10 A       7.34  2022-11-21      6.54
# … with 127 more rows

Another (much more efficient) way to approach this specific problem is to do a non-equi join of dfdata on itself, after creating lower and upper bounds of the date for each row:

dplyr::inner_join(
  dfdata,
  dfdata %>% mutate(l=Order_Date-15,u=Order_Date+15),
  by=join_by(Product, Order_Date>=l,Order_Date<=u)
) %>% 
  reframe(meanprice = mean(Price.y),.by=c(Product, Order_Date.x))

Output (first six rows)

  Product Order_Date.x meanprice
1       A   2022-08-04   6.07475
2       B   2022-08-06   4.88650
3       B   2022-12-11   4.55950
4       F   2022-10-23   4.27900
5       D   2022-12-13   1.66400
6       F   2022-11-22   6.59680
langtang
  • 22,248
  • 1
  • 12
  • 27