1

In order to explain my question better, I have prepared the following sample data that looks similar to my original dataset:

library(zoo)
sample_data <- data.frame(User =              c("customer1", "customer2", "customer3", "customer4", "customer5", "customer1", "customer2", "customer3", "customer4", "customer1", "customer3", "customer5"), 
                          Cohort = as.yearmon(c("2020-03-01", "2020-02-17", "2020-04-10","2020-02-01", "2020-04-10", "2020-03-01", "2020-02-17", "2020-04-10","2020-02-01", "2020-03-01", "2020-04-10", "2020-04-30"), "%Y-%m-%d"),
                          Purchase_month = as.yearmon(c("2020-03-01", "2020-02-17", "2020-04-10", "2020-02-01", "2020-04-10", "2020-07-05", "2020-03-05", "2020-06-11","2020-03-07", "2020-11-01", "2020-11-04", "2020-06-30"), "%Y-%m-%d"),
                          Revenue = c(25, 34, 20, 50, 75, 80, 100, 76, 39, 20, 10, 90))

As you can see, I have purchase data with a column indicating the customer, one with the Cohort they belong to (the month they placed their very first order), another column where you can find their their purchase dates and one with their spend for each purchase.

What I would like to do is to change the table in order to show the Cohorts per row and the revenue per cohort per month in each column. The outcome should basically look like the following

Cohort   | Feb 2020 | Mar 2020 | Apr 2020 | May 2020 | Jun 2020 | Jul 2020 | Aug 2020 | Sep 2020 | Oct 2020 | Nov 2020 | Dec 2020 
Feb 2020 | 84       | 139      | 0        | 0        | 0        | 0        | 0        | 0        | 0        | 0        | 0        
Mar 2020 | 0        | 25       | 0        | 0        | 0        | 80       | 0        | 0        | 0        | 20       | 0        
Apr 2020 | 0        | 0        | 95       | 0        | 169      | 0        | 0        | 0        | 0        | 10       | 0          

Thank you!

Ps: Maybe the title is not quite fitting, but I didn't know how such a transformation of the table is called.

emil_rore
  • 115
  • 1
  • 7
  • `tidyr::pivot_wider(sample_data, names_from = Purchase_month, values_from = Revenue, values_fill = 0)` ? – Ronak Shah Nov 06 '20 at 09:18
  • hey Ronak! Yes, I guess kind of. I just don't need the data per customer, but also aggregated per cohort. Also, my dataset consists of more than just the 4 columns that are indicated in the sample data - so somehow i would want to tell the code to take the "Cohort" as columns". I cannot see this indication in the code yet. – emil_rore Nov 06 '20 at 09:24

1 Answers1

1

We can remove the User column and do the sum of Revenue values.

library(dplyr)
library(tidyr)

sample_data %>%
  select(-User) %>%
  pivot_wider(names_from = Purchase_month, values_from = Revenue, values_fill = 0, values_fn = sum)
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • Hey Ronak! Thanks a lot. Any chance so that the columns are also correctly ordered by month? My dates ranges from 2018 until 2020 - so far I can see that the result mixes up the column order. The ideal would be if the dates could be ordered consecutively, as seen in my output example. – emil_rore Nov 06 '20 at 09:32
  • 1
    You can sort the data by `Purchase_month` before doing `pivot_wider`: `sample_data %>% select(-User) %>% arrange(Purchase_month) %>% pivot_wider(...)` – shizundeiku Nov 06 '20 at 09:52
  • @shizundeiku perfect! very nice, it helped! Thanks a lot! – emil_rore Nov 06 '20 at 09:57