0

Im facing an issue with this task I have. Basically I have price data and along with it, the time and day the item was bought. With this I want to find out how many times price changes during the day for an item. for example, in the morning the price was 7000 but in the evening it could be for 4000. So I want to do this over multiple days for multiple items. Theres also an Order ID associated with the purchase but it need not be unique.

I found this article really helpful and i was able to make some progress but cant get exactly what I need Identify a value changes' date and summarize the data with sum() and diff() in R

I've put the dput output for the data so that it can be recreated. The results should look like this

Item  Price_changed_over_all_days Price_changed_in_one_day
x           10                               3
y           4                                1
z           5                                2

Thank you for any advice/help! Please do let me know if I can make the question any clearer.

PS: If possible too, i'd like to be able to tell the highest and lowest price of any given day, if that can be figured out too that'd be great. I know how to do this for a particular day but

structure(list(item = c("x", "x", "x", "x", "x", "x", "x", "x", 
"x", "x", "x", "x", "x", "x", "x", "x", "x", "x", "x", "x", "x", 
"x", "x", "x", "x", "x", "x", "x", "x", "x", "x", "x", "x", "x", 
"x", "x", "x", "x", "x", "x", "x", "x", "x", "x", "x", "x", "x", 
"x", "x", "x", "x", "x", "x", "x", "x", "x", "x", "x", "x", "x", 
"x", "x", "x", "x", "x", "x", "x", "x", "x", "x", "y", "y", "y", 
"y", "y", "y", "y", "y", "y", "y", "y", "y", "y", "y", "y", "y", 
"y", "y", "y", "y", "y", "y", "y", "y", "y", "y", "y", "y", "y", 
"y", "y", "y", "y", "y", "y", "y", "y", "y", "y", "y", "y", "y", 
"y", "y", "y", "y", "y", "y", "y", "y", "y", "y", "y", "y", "y", 
"y", "y", "y", "y", "y", "y", "y", "y", "y", "y", "y", "y", "y", 
"y", "y", "y", "y", "y", "y", "y", "z", "z", "z", "z", "z", "z", 
"z", "z", "z", "z", "z", "z", "z", "z", "z", "z", "z", "z", "z", 
"z", "z", "z", "z", "z", "z", "z", "z", "z", "z", "z", "z", "z", 
"z", "z", "z", "z", "z", "z", "z", "z", "z", "z", "z", "z", "z", 
"z", "z", "z", "z", "z", "z", "z", "z", "z", "z", "z", "z"), 
    bought_date = structure(c(1600646400, 1600646400, 1600646400, 
    1600646400, 1600646400, 1600646400, 1600732800, 1600732800, 
    1600732800, 1600732800, 1600732800, 1600732800, 1600732800, 
    1600732800, 1600732800, 1600732800, 1600732800, 1600732800, 
    1600732800, 1600732800, 1600732800, 1600732800, 1600732800, 
    1600732800, 1600732800, 1600732800, 1600732800, 1600732800, 
    1600732800, 1600732800, 1600732800, 1600732800, 1600732800, 
    1600732800, 1600732800, 1600732800, 1600732800, 1600732800, 
    1600732800, 1600732800, 1600732800, 1600732800, 1600732800, 
    1600732800, 1600732800, 1600732800, 1600732800, 1600732800, 
    1600732800, 1600732800, 1600732800, 1600732800, 1600732800, 
    1600732800, 1600732800, 1600732800, 1600732800, 1600732800, 
    1600732800, 1600732800, 1600732800, 1600732800, 1600732800, 
    1600732800, 1600732800, 1600732800, 1600732800, 1600732800, 
    1600732800, 1600732800, 1600646400, 1600646400, 1600646400, 
    1600646400, 1600646400, 1600646400, 1600646400, 1600646400, 
    1600646400, 1600646400, 1600646400, 1600646400, 1600646400, 
    1600646400, 1600646400, 1600646400, 1600646400, 1600646400, 
    1600646400, 1600646400, 1600732800, 1600732800, 1600732800, 
    1600732800, 1600732800, 1600732800, 1600732800, 1600732800, 
    1600732800, 1600732800, 1600732800, 1600732800, 1600732800, 
    1600732800, 1600732800, 1600732800, 1600732800, 1600732800, 
    1600732800, 1600732800, 1600732800, 1600732800, 1600732800, 
    1600732800, 1600732800, 1600732800, 1600732800, 1600732800, 
    1600732800, 1600732800, 1600732800, 1600732800, 1600732800, 
    1600732800, 1600732800, 1600732800, 1600732800, 1600732800, 
    1600732800, 1600732800, 1600732800, 1600732800, 1600732800, 
    1600732800, 1600732800, 1600732800, 1600732800, 1600732800, 
    1600732800, 1600732800, 1600732800, 1600732800, 1600732800, 
    1600732800, 1600732800, 1600646400, 1600646400, 1600646400, 
    1600646400, 1600646400, 1600646400, 1600646400, 1600646400, 
    1600646400, 1600732800, 1600732800, 1600732800, 1600732800, 
    1600732800, 1600732800, 1600732800, 1600732800, 1600732800, 
    1600732800, 1600732800, 1600732800, 1600732800, 1600732800, 
    1600732800, 1600732800, 1600732800, 1600732800, 1600732800, 
    1600732800, 1600732800, 1600732800, 1600732800, 1600732800, 
    1600732800, 1600732800, 1600732800, 1600732800, 1600732800, 
    1600732800, 1600732800, 1600732800, 1600732800, 1600732800, 
    1600732800, 1600732800, 1600732800, 1600732800, 1600732800, 
    1600732800, 1600732800, 1600732800, 1600732800, 1600732800, 
    1600732800, 1600732800, 1600732800, 1600732800), class = c("POSIXct", 
    "POSIXt"), tzone = "UTC"), bought_time = structure(c(-2209016101, 
    -2209014165, -2209006172, -2208996246, -2208992947, -2208991967, 
    -2209070025, -2209069890, -2209064616, -2209055193, -2209054850, 
    -2209053617, -2209050638, -2209050426, -2209048499, -2209047983, 
    -2209047872, -2209047390, -2209046473, -2209045120, -2209044562, 
    -2209044418, -2209042104, -2209041480, -2209040748, -2209037870, 
    -2209037696, -2209037309, -2209035846, -2209034872, -2209034429, 
    -2209034323, -2209030237, -2209028615, -2209028570, -2209028477, 
    -2209026900, -2209026787, -2209025234, -2209024468, -2209023183, 
    -2209021020, -2209020175, -2209019934, -2209019733, -2209018417, 
    -2209016646, -2209016540, -2209015208, -2209014941, -2209011636, 
    -2209011444, -2209010896, -2209010639, -2209009483, -2209009412, 
    -2209008912, -2209007424, -2209006197, -2209005462, -2209005439, 
    -2209005414, -2209004221, -2208998803, -2208998727, -2208993252, 
    -2208993224, -2208993194, -2208992478, -2208992218, -2209019432, 
    -2209018785, -2209017271, -2209017188, -2209017177, -2209014531, 
    -2209014484, -2209014247, -2209013964, -2209012511, -2209009805, 
    -2209009633, -2209009617, -2209009556, -2209009533, -2209009499, 
    -2209009474, -2209008099, -2209007958, -2209000389, -2209068522, 
    -2209062412, -2209062053, -2209058480, -2209058472, -2209058161, 
    -2209057878, -2209057740, -2209056037, -2209055339, -2209055045, 
    -2209054472, -2209051624, -2209050659, -2209050339, -2209047529, 
    -2209045264, -2209038811, -2209038586, -2209038487, -2209038004, 
    -2209036906, -2209036606, -2209034142, -2209034049, -2209033773, 
    -2209030890, -2209030794, -2209030626, -2209029600, -2209029464, 
    -2209027707, -2209026486, -2209024697, -2209021552, -2209021379, 
    -2209019844, -2209019716, -2209018482, -2209018436, -2209018365, 
    -2209017376, -2209017340, -2209017319, -2209017054, -2209016900, 
    -2209016126, -2209014622, -2209013286, -2209012584, -2209009905, 
    -2209009208, -2209006827, -2209006663, -2208990872, -2209020164, 
    -2209015899, -2209013965, -2209013933, -2209011963, -2209010443, 
    -2209010351, -2209008868, -2209007569, -2209063141, -2209063059, 
    -2209062882, -2209062852, -2209054720, -2209054349, -2209050324, 
    -2209049810, -2209047902, -2209041612, -2209039205, -2209038444, 
    -2209038393, -2209038219, -2209037598, -2209037562, -2209037497, 
    -2209037082, -2209036943, -2209036795, -2209036404, -2209034846, 
    -2209032324, -2209032289, -2209031999, -2209031958, -2209030309, 
    -2209029952, -2209023411, -2209022296, -2209021086, -2209020624, 
    -2209020221, -2209019575, -2209017996, -2209017794, -2209014135, 
    -2209011509, -2209009303, -2209007905, -2209007799, -2209007709, 
    -2209005139, -2209004957, -2208998695, -2208998233, -2208990008, 
    -2208989978), class = c("POSIXct", "POSIXt"), tzone = "UTC"), 
    ID = c(540273, 540333, 540568, 540734, 540766, 540771, 540808, 
    540810, 540847, 541011, 541022, 541060, 541147, 541160, 541231, 
    541252, 541259, 541283, 541317, 541379, 541396, 541399, 541503, 
    541537, 541562, 541682, 541684, 541704, 541779, 541849, 541879, 
    541883, 542039, 542115, 542117, 542120, 542164, 542166, 542207, 
    542236, 542275, 542358, 542394, 542403, 542414, 542457, 542515, 
    542522, 542579, 542598, 542741, 542749, 542772, 542786, 542825, 
    542831, 542854, 542934, 542975, 543003, 543004, 543005, 543044, 
    543109, 543111, 543156, 543158, 543159, 543162, 543164, 540161, 
    540187, 540230, 540231, 540233, 540322, 540324, 540329, 540344, 
    540384, 540468, 540477, 540480, 540482, 540483, 540485, 540486, 
    540522, 540526, 540683, 540820, 540876, 540880, 540917, 540918, 
    540927, 540934, 540935, 540989, 541005, 541014, 541034, 541114, 
    541146, 541163, 541276, 541371, 541646, 541653, 541658, 541678, 
    541725, 541738, 541892, 541895, 541916, 542015, 542021, 542028, 
    542080, 542084, 542143, 542175, 542225, 542333, 542337, 542409, 
    542415, 542455, 542456, 542460, 542482, 542485, 542487, 542500, 
    542505, 542544, 542610, 542677, 542704, 542814, 542837, 542950, 
    542955, 543174, 540141, 540281, 540343, 540348, 540401, 540453, 
    540457, 540500, 540535, 540865, 540866, 540869, 540871, 541027, 
    541038, 541165, 541187, 541257, 541533, 541627, 541661, 541662, 
    541668, 541691, 541693, 541695, 541713, 541723, 541731, 541751, 
    541850, 541960, 541963, 541978, 541981, 542035, 542053, 542269, 
    542301, 542355, 542375, 542390, 542424, 542466, 542471, 542642, 
    542745, 542835, 542911, 542917, 542920, 543019, 543031, 543112, 
    543117, 543178, 543179), price = c(7190, 9200, 7170, 7170, 
    7170, 9170, 7170, 7170, 7170, 9170, 7170, 9170, 8330, 7170, 
    9170, 7170, 9170, 7170, 7170, 7170, 7170, 7170, 7170, 7170, 
    7170, 7170, 7170, 9170, 9170, 9170, 9170, 9170, 8330, 7170, 
    7170, 7170, 7170, 7170, 7170, 7170, 7170, 7170, 9170, 9170, 
    7170, 7170, 7170, 7170, 7170, 7170, 7170, 7170, 7170, 9170, 
    7170, 7170, 9170, 7170, 7160, 7160, 7160, 7160, 7160, 7160, 
    7160, 7160, 8330, 8330, 8330, 8330, 7190, 7190, 7190, 7190, 
    7190, 7190, 7190, 7190, 9200, 9200, 7190, 7190, 7190, 7190, 
    7190, 7190, 7190, 9200, 9200, 9170, 9170, 7170, 7170, 9170, 
    7170, 7170, 9170, 9170, 8330, 9170, 8330, 8330, 9170, 7170, 
    9170, 7170, 7170, 9170, 9170, 9170, 7170, 7170, 7170, 7170, 
    7170, 7170, 7170, 7170, 7170, 7170, 7170, 9170, 7170, 9170, 
    9170, 9170, 7170, 7170, 7170, 9170, 9170, 7170, 7170, 7170, 
    7170, 7170, 7170, 8330, 7170, 9170, 7170, 9170, 7170, 7170, 
    9160, 7190, 9200, 7190, 7190, 14880, 9200, 9200, 9200, 7190, 
    7170, 7170, 7170, 7170, 7170, 7170, 8330, 7170, 9170, 7170, 
    9170, 9170, 9170, 9170, 7170, 7170, 7170, 9170, 7170, 7170, 
    7170, 7170, 7170, 7170, 9170, 9170, 9170, 7170, 7170, 7170, 
    9170, 9170, 9170, 9170, 7170, 7170, 7170, 8330, 7170, 7170, 
    7170, 7170, 7160, 7160, 9160, 7160, 9160, 9160)), class = c("tbl_df", 
"tbl", "data.frame"), row.names = c(NA, -202L))
Waldi
  • 39,242
  • 6
  • 30
  • 78
MSasta
  • 86
  • 8

2 Answers2

4

For starters, I assume your data is in variable df. You have to think carefully about what you want to count. If you want the number of price changes regardless of the days, you can do this:

df %>% group_by(item) %>% 
  summarise(Price_changed_over_all_days = 
      sum((lead(price) - price)!=0, na.rm = TRUE))

#  A tibble: 3 x 2
#  item  Price_changed_over_all_days
#  <chr>                       <int>
#1 x                              24
#2 y                              30
#3 z                              24

However, if you want to count the number of price changes in particular days, you will get something like this:

df %>% group_by(item, bought_date) %>% 
      summarise(Price_changed_in_one_day = 
          sum((lead(price) - price)!=0, na.rm = TRUE))
#  A tibble: 6 x 3
#  Groups:   item [3]
#  item  bought_date         Price_changed_in_one_day
#  <chr> <dttm>                                 <int>
#1 x     2020-09-21 00:00:00                        3
#2 x     2020-09-22 00:00:00                       20
#3 y     2020-09-21 00:00:00                        4
#4 y     2020-09-22 00:00:00                       26
#5 z     2020-09-21 00:00:00                        5
#6 z     2020-09-22 00:00:00                       18

It's just that in this case, you have a lot more rows in the summary table. If you only want one table, you have to somehow assemble it and decide some stats based on the values for the days. Maybe the average will be appropriate here? I do not know that.

df %>% group_by(item) %>% 
  summarise(Price_changed_over_all_days = 
              sum((lead(price) - price)!=0, na.rm = TRUE)) %>% 
  left_join(
    df %>% group_by(item, bought_date) %>% 
      summarise(Price_changed_in_one_day = 
                  sum((lead(price) - price)!=0, na.rm = TRUE)) %>% 
      group_by(item) %>% 
      summarise(Price_changed_in_one_day = 
                  mean(Price_changed_in_one_day)
      ), by= "item")
#  A tibble: 3 x 3
#  item  Price_changed_over_all_days Price_changed_in_one_day
#  <chr>                       <int>                    <dbl>
#1 x                              24                     11.5
#2 y                              30                     15  
#3 z                              24                     11.5

Also note that price changes can occur at the turn of the day and therefore the sum of changes in several days for a given product does not have to equal the sum of all price changes for that product. In your case, this is the case for product "x".

Marek Fiołka
  • 4,825
  • 1
  • 5
  • 20
  • I really like your elaborate, narrative style of answering. – Martin Gal Sep 09 '21 at 18:20
  • Great response, thank you for the effort you put in. One question, and I cannot figure out why this is happening, when I sum `(lead(price) - price)` I am getting `NA` – MSasta Sep 10 '21 at 08:11
  • To understand how a function works, you usually have to try it out yourself. So I suggest that you do the following commands `lead(1:4)` then `lead(1:4) -1:4`. See what you get the result and why later in `sum((lead(1:4) - 1:4)! = 0, na.rm = TRUE))` you need to enter `na.rm = TRUE`. Hopefully this approach teaches you much more than just writing about how it works. – Marek Fiołka Sep 10 '21 at 09:10
3

With data.table you could use rleid:

library(data.table)
setDT(data)
data[,.(times=max(rleid(price))-1),by=.(item)]
#   item times
#1:    x    24
#2:    y    30
#3:    z    24

data[,.(timesday=max(rleid(price))-1),by=.(item,bought_date)]
#   item bought_date timesday
#1:    x  2020-09-21        3
#2:    x  2020-09-22       20
#3:    y  2020-09-21        4
#4:    y  2020-09-22       26
#5:    z  2020-09-21        5
#6:    z  2020-09-22       18
Waldi
  • 39,242
  • 6
  • 30
  • 78