0

Hi I have a data set called data1 with 3 columns one is called Close.Date the other one is called Closed.Won.MRR the other one is called Sign.up.Date

A sample of the data

Close.Date Closed.Won.MRR Sign.up.Date
2020-04         447.0     2020-02
2020-03         100.0     2020-02
2022-04         536.4     2022-03
2022-06         150.0     2022-03
2022-06         125.0     2022-03
2022-06         147.0     2021-10
...                 ...        ....

with this data and with code I already created a dataframe called cohort_table2 that on the rows has all the Sign.up.Date possible as string names on a column called Period and on the columns have all the Close.Date possible as string column titles

it looks like this but right now is empty

Period    2020-02  2020-03  2020-04  ... ... ..  
2020-02      0        0        0
2020-03      0        0        0
2020-04      0        0        0
.
.
. 
.

how can I go row by row of my data and put it in the right bucket and that it keeps going summing each value in the right position until is finished, for example a record

Close.Date Closed.Won.MRR Sign.up.Date
2022-04         447.0     2022-04

it will put a value of 447 on the location for (row/ period) 2022-04 and column 2022-04 (close.Date), then it will move to the next record do the same until it finishes with all the records

for example for the first 2 records the dataframe cohort_table2 will look like this

Period    2020-02  2020-03  2020-04  ... ... ..  
2020-02      0      100        447
2020-03      0        0        0
2020-04      0        0        0
.
.
. 
.
Juan Lozano
  • 635
  • 1
  • 6
  • 17
  • 1
    Does this answer your question? [How to reshape data from long to wide format](https://stackoverflow.com/questions/5890584/how-to-reshape-data-from-long-to-wide-format) – Aurèle Jul 19 '22 at 07:25
  • For me it is not clear what you need? Maybe you could clarify with an expected output! Many thanks. – TarJae Jul 19 '22 at 07:31
  • 1
    @TarJae I added an expected output for the first to records of my data – Juan Lozano Jul 19 '22 at 07:39
  • There exists non-unique sets of signup and closed one, `2022-03` , `2022-06`. How do you want to handle this? – Park Jul 19 '22 at 07:47
  • @Park the dataframe cohort_table2 has all the possible values from the min of date to the max of date, even if it has no data in it, thats why I created that dataframe cause I also need to show the dates were there are no data in it, the sample of cohort_table2 that I show there is little but it ges all the way from 2020-02 (min value) to 2022-06 max value – Juan Lozano Jul 19 '22 at 13:25
  • I mean, there are two values for that case. – Park Jul 19 '22 at 23:53
  • I think OP wants them summed, based on "summing each value". – Stewart Macdonald Jul 20 '22 at 01:40

1 Answers1

1

I'm not sure I entirely understand, but are you after something like this?

library(dplyr)

# Read sample data in as a data frame
dataDF <- read.csv(text='Close.Date,Closed.Won.MRR,Sign.up.Date
2020-04,447.0,2020-02
2020-03,100.0,2020-02
2022-04,536.4,2022-03
2022-06,150.0,2022-03
2022-06,125.0,2022-03
2022-06,147.0,2021-10')

# Get the list of all unique dates and sort it
all_dates <- sort(unique(c(dataDF$Close.Date, dataDF$Sign.up.Date)))

# Create an empty matrix to hold our data
results_mat <- matrix(NA, ncol=length(all_dates), nrow=length(all_dates))
colnames(results_mat) <- all_dates
rownames(results_mat) <- all_dates

# Calculate the sums of all existing data combinations
data_sum <- dataDF %>%
  group_by(Close.Date, Sign.up.Date) %>%
  summarise(sum = sum(Closed.Won.MRR)) %>%
  pivot_wider(names_from='Close.Date', values_from = 'sum')

data_mat <- data_sum %>%
  dplyr::select(-Sign.up.Date) %>%
  as.matrix()

rownames(data_mat) <- data_sum$Sign.up.Date

# Insert our sums into our empty matrix
results_mat[row.names(data_mat), colnames(data_mat)] <- unlist(data_mat)
results_mat

        2020-02 2020-03 2020-04 2021-10 2022-03 2022-04 2022-06
2020-02      NA     100     447      NA      NA      NA      NA
2020-03      NA      NA      NA      NA      NA      NA      NA
2020-04      NA      NA      NA      NA      NA      NA      NA
2021-10      NA      NA      NA      NA      NA      NA     147
2022-03      NA      NA      NA      NA      NA   536.4     275
2022-04      NA      NA      NA      NA      NA      NA      NA
2022-06      NA      NA      NA      NA      NA      NA      NA


Update: edited code based on OP's comments

Stewart Macdonald
  • 2,062
  • 24
  • 27
  • is really close to what I need but I need all the possible dates in the row and the column from the min to the max the max date, even if it doesnt have data in it, thats why I created the data frame called cohort_table2, now that I have that datfram I just need to populate it – Juan Lozano Jul 19 '22 at 13:23
  • 1
    I've updated the code to make sure that all dates are now included in the final result's rows AND columns. Is that what you're after? – Stewart Macdonald Jul 20 '22 at 01:39