-1

I have a dataset called "order_product" like this:

order_id   product  order_sequence  reorder
    1        egg          1            1
    1        meat         2            0
    1        fruit        3            1
    1        meat         4            1
    2        egg          1            1
    2        egg          2            1
    2        fruit        3            0
    3        egg          1            0
    3        fruit        2            1
    3        fruit        3            1

and I am going to aggregate the data into a new data frame called "product" which is grouped by product. The variables of new aggregated dataset shows the total frequency, the reorder rate and mean sequence of each product. The calculation of each variable are as follows:

frequency: product count
reorder_rate: sum of reorder/frequency 
mean_sequence: sum or order_sequence/frequency

So the result should be like this:

product  frequency reorder_rate  mean_sequence
  egg        4          3/4           5/4
  meat       2          1/2            3
 fruit       4          3/4           11/4

Can anybody help me with this in R? I tried the melt() function in package data.table, but I don't know how to code it.

jay.sf
  • 60,139
  • 8
  • 53
  • 110
frida guo
  • 61
  • 1
  • 7

1 Answers1

3

Such calculations are easy using dplyr

library(dplyr)

df %>%
  group_by(product) %>%
  summarise(frequency = n(), 
            reorder_rate = sum(reorder)/frequency, 
            mean_sequence = sum(order_sequence)/frequency)

# A tibble: 3 x 4
#  product frequency reorder_rate mean_sequence
#  <fct>       <int>        <dbl>         <dbl>
#1 egg             4         0.75          1.25
#2 fruit           4         0.75          2.75
#3 meat            2         0.5           3  

However, you can also use data.table

library(data.table)

setDT(df)[, .(frequency = .N, reorder_rate = sum(reorder)/.N, 
              mean_sequence = sum(order_sequence)/.N), by = product]
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213