just started with R and have not been able to find a fix, have read multiple answers but have not found a suitable one. I am trying to calculate and use correlation as a distance measure between a bunch of stores, so to come up with trail - control pairings to then assess whether a marketing campaign had a significant influence on post sales.
Total sales pre marketing campaign is the metric of interest, I ve got seven months worth of it for each store, and would like to loop through all of them to find the more suitable trial - control pairing for each month. Three are the stores object of the marketing campaign (trial) which was run for three months as well, hence, the necessity to find a good trial - control store match for each month.
Here is what I came up with so far which seems to be working, however, I have yet to understand how to store the results in an handy format I can subsequently use to assess where the highest trial - control store correlation is for each month:
my.fun <- function(trial){
for (store in st.vector) {
trial <- stores_stats_pre %>% filter(store_nbr == trial) %>% select(total_sales)
control <- stores_stats_pre %>% filter(store_nbr == store) %>% select(total_sales)
cor(control$total_sales, trial$total_sales)
}
}
and I would then simply use it as my.fun(trial_store_number)
st.vector
contains stores' unique IDs (trial stores were removed to avoid calculating correlation with themselves)
trial_stores <- c(77, 86, 88)
st.vector <- unique(stores_stats_pre$store_nbr)
st.vector <- st.vector[!st.vector %in% trial_stores]
store_stats_pre
is a data frame containing a bunch of metrics pre marketing campaign for a total of 260 stores (I included only the first two):
store_stats_pre <- data.frame(
store_nbr=c(1,1,1,1,1,1,1,2,2,2,2,2,2,2),
year_month=c('2018-07', '2018-08', '2018-09', '2018-10', '2018-11', '2018-12', '2019-01','2018-07', '2018-08', '2018-09', '2018-10', '2018-11', '2018-12', '2019-01'),
total_sales=c(206, 176, 278, 188, 192, 189, 154, 150, 193, 155, 168, 163, 136, 159))
I tried creating an empty data frame outside the loop, however, I am unable to understand how I can append/store the correlation and related control store number into it. Ideally, it would look something like this:
results_dataframe <- data.frame(
Control_nbr = c(1,2,3, etc.),
Correlation = c(correlation_vs_trial_store)
)
And I would modify my code like this:
results_dataframe <- data.frame(Control_nbr = integer(0), Correlation = integer(0))
my.fun <- function(trial){
for (store in st.vector) {
trial <- stores_stats_pre %>% filter(store_nbr == trial) %>% select(total_sales)
control <- stores_stats_pre %>% filter(store_nbr == store) %>% select(total_sales)
correlation <- cor(control$total_sales, trial$total_sales)
results_dataframe[Control_nbr] <- store
results_dataframe[Correlation] <- correlation
}
}
But it doesn't work and I also get an "Error in cor(control$total_sales, trial$total_sales) : incompatible dimensions" message.
Also, I read growing objects inside loops is a bad practice, therefore, I am not sure how I should go about it.
Thanks