I am building upon my earlier question on SO at Pipe output of one data.frame to another using dplyr
I want to create six correlation matrices that would let me analyze evolution of correlation in $ spent and quantity sold in the last three years. In essence, I am looking for 2 X [3X3]
type list. So far, I am able to create 3X3
list using tidyr::map()
by making separate calls for each Product_Type
and Quantity
, but I have been unsuccessful in doing in one vectorized call. As you will see below, there is a lot of redundancy in my code.
Here's my data:
dput(DFile_Gather)
structure(list(Order.ID = c(456, 567, 345, 567, 2345, 8910, 8910,
789, 678, 456, 345, 8910, 234, 1234, 456), Calendar.Year = c(2015,
2015, 2016, 2015, 2017, 2015, 2015, 2016, 2015, 2015, 2016, 2015,
2016, 2016, 2015), Product_Type = c("Insurance", "Insurance",
"Tire", "Tire", "Rental", "Insurance", "Servicing", "Truck",
"Tire", "Servicing", "Truck", "Rental", "Car", "Servicing", "Tire"
), Mexican_Pesos = c(35797.32, 1916.25, 19898.62, 0, 22548.314011,
686.88, 0, 0, 0, 0, 0, 0, 0, 0, 203276.65683), Quantity = c(0.845580721440663,
0.246177053792905, 2.10266268677851, 1.89588258358317, 0.00223077008050406,
0.454640961140588, 1.92032156606277, 0.475872861771994, 0.587966920885798,
0.721024745664671, 0.696609684682582, 0.0441522564791413, 0.872232778060772,
0.343347997825813, 0.716224049425646)), .Names = c("Order.ID",
"Calendar.Year", "Product_Type", "Mexican_Pesos", "Quantity"), row.names = c(54L,
55L, 13L, 15L, 50L, 58L, 28L, 37L, 16L, 24L, 33L, 48L, 2L, 29L,
14L), class = "data.frame")
Here's my code for the first iteration : i.e. calculate correlation matrix for Product_Type
DFile_Spread_PType <- spread(DFile_Gather[-length(DFile_Gather)],key = Product_Type, value = Mexican_Pesos)
DFile<-DFile_Spread_PType
CYear <- unique(DFile$Calendar.Year)
DFile_Corr_PType <- purrr::map(CYear, ~ dplyr::filter(DFile, Calendar.Year == .)) %>%
purrr::map(~ cor(.[,colnames(DFile)[3:length(colnames(DFile))]]) ) %>%
structure(., names = CYear)
Finally, here's my code for the second iteration for Correlation Matrix by Quantity:
DFile_Spread_Qty <- spread(subset( DFile_Gather, select = -Mexican_Pesos),key = Product_Type, value = Quantity)
DFile<-DFile_Spread_Qty
DFile_Corr_Qty <- purrr::map(CYear, ~ dplyr::filter(DFile, Calendar.Year == .)) %>%
purrr::map(~ cor(.[,colnames(DFile)[3:length(colnames(DFile))]]) ) %>%
structure(., names = CYear)
As you can see above, there is too much redundancy, and the code looks really clunky. I would sincerely appreciate if someone could help me out. I am specifically looking for two things:
1) do what I am doing above by not having any redundancy
2) If possible, get a list of 2X3X3 i.e. Quantity
and Product_Type
at top level, and then 3x3 correlation matrices referenced to each of the above.
I searched similar topics on SO, but I don't think there was any thread on similar topics.
Thanks in advance.