0

I have a large dataset of sales information from multiple stores over a period of a few weeks. I need to calculate revenues and average prices over minute intervals, and I can't figure out a smart way to do this. So for example for the data below, I'd want to calculate the revenues and average prices over 10-minute periods, for example the 10-minute period on 2019-02-11 from 09:10:00 to 09:20:00 would result in 2 * 14 + 5 * 9. I've considered labeling each interval with a number and adding a column with the labels, but I don't really know how to implement this. Another option I thought of was to create a separate dataframe with the intervals, and then somehow mapping information from the original data to the interval dataframe, but I didn't get far with this either. Any help on this would be much appreciated!

Example data:

Time Quantity Price
2019-02-11 09:15:23 2 14
2019-02-11 09:18:01 5 9
2019-02-11 10:15:23 1 12
2019-02-11 09:28:01 5 9
J.math
  • 3
  • 1
  • 1
    Can you try to make your question more reproducible? For example, use `dput(head(data))` to generate data that is easy to copy and paste into R. Show us what code you have tried. What is your expected output? https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – Ben Norris Apr 12 '21 at 21:15
  • Sorry for the clumsy format, I'm a bit new to this. Fortunately crestor's solution worked like a charm, the output is exactly what I wanted. – J.math Apr 13 '21 at 05:52

1 Answers1

0
library(tidyverse)
library(lubridate)


df <- read.table(textConnection("time;quantity;unit_price
2019-02-11 09:15:23;2;14
2019-02-11 09:18:01;5;9
2019-02-11 10:15:23;1;12
2019-02-11 09:28:01;5;9"),
                 sep = ";",
                 header = TRUE)
df1 <- df %>%
  mutate(
    time = lubridate::ymd_hms(time),
    time_10min = floor_date(time, "hour") + minutes(minute(time) %/% 10 * 10)
  )
df1
#>                  time quantity unit_price          time_10min
#> 1 2019-02-11 09:15:23        2         14 2019-02-11 09:10:00
#> 2 2019-02-11 09:18:01        5          9 2019-02-11 09:10:00
#> 3 2019-02-11 10:15:23        1         12 2019-02-11 10:10:00
#> 4 2019-02-11 09:28:01        5          9 2019-02-11 09:20:00

df1 %>%
  group_by(time_10min) %>%
  summarise(avg_price = mean(unit_price),
            revenue = sum(quantity * unit_price))
#> # A tibble: 3 x 3
#>   time_10min          avg_price revenue
#>   <dttm>                  <dbl>   <int>
#> 1 2019-02-11 09:10:00      11.5      73
#> 2 2019-02-11 09:20:00       9        45
#> 3 2019-02-11 10:10:00      12        12
crestor
  • 1,388
  • 8
  • 21