1

I have a table similar to this

Year       Month    Purchase_ind    Value
 2018        1           1           100
 2018        1           1           100
 2018        1           0           100
 2018        2           1            2
 2018        2           0           198
 2018        3           1           568
 2019        1           0           230
   .
   .
   .
  

And I want to do a matrix whth:

  • Year for Y axis
  • Month for X axis
  • in the calculate section, I need (Value with Purchase ind=1)/Total value

Having this as a result:


    2018  2019 2020
1   0.66    0    x
2   0.01    x    x
3    1      x    x
     

Thanks a lot for your help!

Moises T
  • 59
  • 3

3 Answers3

2

You can calculate the proportion for Year and Month and cast the data to wide format :

library(dplyr)
df %>%
  group_by(Year, Month) %>%
  summarise(Value = sum(Value[Purchase_ind == 1])/sum(Value)) %>%
  tidyr::pivot_wider(names_from = Year, values_from = Value)
  #Add values_fill = 0 if you want 0's instead of `NA`'s
  #tidyr::pivot_wider(names_from = Year, values_from = Value, values_fill = 0)

#  Month `2018` `2019`
#  <int>  <dbl>  <dbl>
#1     1  0.667      0
#2     2  0.01      NA
#3     3  1         NA

data

df <- structure(list(Year = c(2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 
2019L), Month = c(1L, 1L, 1L, 2L, 2L, 3L, 1L), Purchase_ind = c(1L, 
1L, 0L, 1L, 0L, 1L, 0L), Value = c(100L, 100L, 100L, 2L, 198L, 
568L, 230L)), class = "data.frame", row.names = c(NA, -7L))
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
2

using data.table:

DT <- data.table(year = c(2018,2018,2018,2018,2018,2018,2019),
                 month = c(1,1,1,2,2,3,1),
                 purchase_ind = c(1,1,0,1,0,1,0),
                 value = c(100,100,100,2,198,568,230))

DT[, value_ind := fifelse(purchase_ind == 1, value, 0)]
DT <- copy(DT[, .(calculate_session = sum(value_ind) / sum(value)), by = .(year, month)])

dcast(DT, month ~ year, value.var = 'calculate_session')

Output:

   month      2018 2019
1:     1 0.6666667    0
2:     2 0.0100000   NA
3:     3 1.0000000   NA
daniellga
  • 1,142
  • 6
  • 16
2

in base R you could do:

(a <- prop.table(xtabs(Value ~ Month + Year + Purchase_ind, df), c(1, 2)))

, , Purchase_ind = 0

     Year
Month      2018      2019
    1 0.3333333 1.0000000
    2 0.9900000          
    3 0.0000000          

, , Purchase_ind = 1

     Year
Month      2018      2019
    1 0.6666667 0.0000000
    2 0.0100000          
    3 1.0000000    

of course if you only need the purchase_ind = 1, you could just subscript it:

 a[, , "1"] #or even a[, , 2]

    Year
Month      2018      2019
    1 0.6666667 0.0000000
    2 0.0100000          
    3 1.0000000   

      
Onyambu
  • 67,392
  • 3
  • 24
  • 53