1

I am estimating a time series Error Correction Model on my data (with package 'ecm'). In below code you can see that I specify the short and long term variables with xeq and xtr.

These variables are independent variables and estimate on the dependent variable: Sales.

In this case, it is a pooled model but I want to estimate this model unit by unit (so separate for every brand). Since my dataset is rather large and consists of 360 product categories, each having 3 brands (brand 2, brand 3 and brand 4).

xeq <- DatasetThesisSynergyClean[c('lnPrice', 'lnAdvertising', 'lnDisplay', 'IntrayearCycles', 'lnCompetitorPrices', 'lnCompADV', 'lnCompDISP' , 'ADVxDISP', 'ADVxCYC', 'DISPxCYC', 'ADVxDISPxCYC')]     
xtr <- DatasetThesisSynergyClean[c('lnPrice', 'lnAdvertising', 'lnDisplay', 'IntrayearCycles', 'lnCompetitorPrices', 'lnCompADV', 'lnCompDISP', 'ADVxDISP',  'ADVxCYC', 'DISPxCYC', 'ADVxDISPxCYC')]     
model11 <- ecm(DatasetThesisSynergyClean$lnSales, xeq, xtr, includeIntercept=TRUE)
summary(model11)

What I want is to generate an output for every brand of every category. To give you glimpse of my data, please run this code:

structure(list(Week = 7:17, Category = c("2", "2", "2", "2", 
"2", "2", "2", "2", "2", "2", "2"), Brand = c("3", "3", "3", 
"3", "3", "3", "3", "3", "3", "3", "3"), Display = c(0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0), Sales = c(0, 0, 0, 0, 13.440948, 40.097397, 
32.01384, 382.169189, 2830.748779, 4524.460938, 1053.590576), 
    Price = c(0, 0, 0, 0, 5.949999, 5.95, 5.950003, 4.87759, 
    3.787015, 3.205987, 4.898724), Distribution = c(0, 0, 0, 
    0, 1.394019, 1.386989, 1.621416, 8.209759, 8.552915, 9.692097, 
    9.445554), Advertising = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
    0), lnSales = c(11.4945151554497, 11.633214247508, 11.5862944141137, 
    11.5412559646132, 11.4811122484454, 11.4775106999991, 11.6333660772506, 
    11.4859819773102, 11.5232680456161, 11.5572670584292, 11.5303686934256
    ), IntrayearCycles = c(4.15446534315765, 3.62757053512638, 
    2.92387946552647, 2.14946414386239, 1.40455011205262, 0.768856938870769, 
    0.291497141953598, -0.0131078404184544, -0.162984144025091, 
    -0.200882782749248, -0.182877633924882), `Competitor Advertising` = c(10584.87063, 
    224846.3243, 90657.72553, 0, 0, 0, 2396.54212, 0, 0, 0, 40343.49444
    ), `Competitor Display` = c(0.385629, 2.108133, 2.515806, 
    4.918288, 3.81749, 3.035847, 2.463194, 3.242594, 1.850399, 
    1.751096, 1.337943), `Competitor Prices` = c(5.30989, 5.372752, 
    5.3717245, 5.3295525, 5.298393, 5.319466, 5.1958415, 5.2941095, 
    5.296757, 5.294059, 5.273578), ZeroSales = c(1, 1, 1, 1, 
    0, 0, 0, 0, 0, 0, 0)), .Names = c("Week", "Category", "Brand", 
"Display", "Sales", "Price", "Distribution", "Advertising", "lnSales", 
"IntrayearCycles", "Competitor Advertising", "Competitor Display", 
"Competitor Prices", "ZeroSales"), row.names = 1255:1265, class = "data.frame")

As you can see, I have all the categories and brands stored in rows. To get an estimation on every single brand I want to write a for loop, but I don't really know how to specify the right category and brand in order to save this output separately.

Eventually want to store the coefficients, std. error, t-values and p-values, of all brands in 4 separate dataframes. But first I need to obtain the output, can you guys help me out?

Martin Schmelzer
  • 23,283
  • 6
  • 73
  • 98
PimM
  • 25
  • 6
  • There's an alternative way to build a model for each brand using `dplyr`, `purrr` packages. There many posts online that have to do with linear regression models. but you should be able to adjust it to your case. See some info here https://www.r-bloggers.com/running-a-model-on-separate-groups/ and here https://stackoverflow.com/questions/38621556/advice-on-usage-of-dplyr-do-vs-purrr-map-tidynest-for-predictions – AntoniosK Dec 14 '17 at 13:44

2 Answers2

0

I would suggest you take a look at some of the tidyverse packages, and consider using a vectorised approach combining split(df, list(df$Category, df$Group)) and purrr's map() function to apply a function to each of your smaller datasets. The code would be something like this:

df %>% 
  split(f = list(.$Category, .$Brand)) %>% 
  map(a_function_for_each_group) %>%
  bind_rows()

I hope i have understood your question correctly.

Peter H.
  • 1,995
  • 8
  • 26
  • I could apply this, but how should I proceed further then? – PimM Dec 14 '17 at 14:34
  • It is difficult to suggest a way to proceed without knowing the structure and content of the `a_function_for_each_group` output. – Peter H. Dec 14 '17 at 14:39
0

You could use dplyr like this:

f <- function(.) {
  xeq <- as.data.frame(select(., lnPrice, lnAdvertising, lnDisplay, IntrayearCycles, lnCompetitorPrices, lnCompADV, lnCompDISP, ADVxDISP, ADVxCYC, DISPxCYC, ADVxDISPxCYC))
  xtr <- as.data.frame(select(., lnPrice, lnAdvertising, lnDisplay, IntrayearCycles, lnCompetitorPrices, lnCompADV, lnCompDISP, ADVxDISP,  ADVxCYC, DISPxCYC, ADVxDISPxCYC))
  print(xeq)
  print(xtr)
  summary(ecm(.$lnSales, xeq, xtr, includeIntercept = TRUE))
}


Models <- DatasetThesisSynergyClean %>% 
  group_by(Category, Brand) %>% 
  do(Model = f(.))


Models$Category
[1] "2" "3"
Models$Brand
[1] "3" "3"
Models$Model
[[1]]

Call:
lm(formula = dy ~ ., data = x)
# ... and so on

You end up with a list of 3 items (the categories, brands and model summary objects) and length equal to unique category/brand combinations. Could not try it properly, since I do not have the complete data. Model summary for Category 3, Brand 3:

Models$Model[[which(Models$Category == 3 & Models$Brand == 3)]]

Update:

If you want standalone object for each model you can give them corresponding names and use list2env():

names(Models$Model) <- paste0("C", Models$Category, "B", Models$Brand)
list2env(Models$Model, .GlobalEnv)
Martin Schmelzer
  • 23,283
  • 6
  • 73
  • 98
  • This seems to work! I can see the DF 'models' and when I run the last three lines I see all the output showing up in the console. I only do not get how I should obtain the model output for one specific category&brand. – PimM Dec 14 '17 at 14:36
  • Well `Models$Model[[1]]` corresponds to Cat 2, Brand 3 for example... `Models$Category[1]` and `Models$Brand[1]` – Martin Schmelzer Dec 14 '17 at 14:39
  • So you could do this: `Models$Model[[which(Models$Category == 3 & Models$Brand == 3)]]` – Martin Schmelzer Dec 14 '17 at 14:51
  • Ah of course. The weird thing is that in my estimations ouptut, there are also variables containing NA's. That is weird, since my dataset does not contain any NA's at all. – PimM Dec 14 '17 at 14:54
  • Remove those variable that are zero constants. This will only prettify the summary output. The coefficients of the other variables wont change. – Martin Schmelzer Dec 14 '17 at 15:16
  • Ah, I see. I tackled the -inf problem beforehand. It is sometimes indeed that variables are 0, so there is nothing to make an estimation on since there is no effect. Many thanks for the help! – PimM Dec 14 '17 at 15:17
  • My dataset contains many zero's since it is rather big, so removing these would be a massive job, right? Because then I need to remove the whole brand, and so I lose good data as well. – PimM Dec 14 '17 at 15:27
  • I do not know your data. I do not know how many observations you have for each cat/brand combination. Maybe other data transformations help. – Martin Schmelzer Dec 14 '17 at 15:35
  • Almost 200.000 observations, haha. But, do you perhaps know how I could sum the RSS of ALL model outputs. I need to do a chow test for pooling or not. I think this works in a for loop as well? By first saving the RSS for every model and then summing it up. I know I opbtain these values with an Anova, but how do I obtain them for all of them? – PimM Dec 14 '17 at 15:48
  • Well I am not here to write your Bachelor thesis (?) ;) you have all the RSS in the Models object already. Use `lapply()` on that list to extract/do something. – Martin Schmelzer Dec 14 '17 at 15:51
  • Of course not! I'm not that experienced in R and they expect a bit too much from me in my Master's Thesis. It gives me errors in specifying the specific output for a model.. hopefully I find a solution. – PimM Dec 14 '17 at 16:12
  • One remaining question, I got the outputs in the console by calling them with [ .. ]. Is there also a way to save all these outputs separately? – PimM Dec 15 '17 at 12:20
  • Yes of course. See my update. Oh, notice that you can close questions by accepting an answer that is satisfying (concerning all your questions). – Martin Schmelzer Dec 15 '17 at 16:19
  • Ah, that works. Perfect! I know need to extract all residuals (RSS actually), p-values, coefficients and t-values for these dataframes. Do you know how to call these from the dataframes all together? – PimM Dec 16 '17 at 10:40
  • I need to export all my regression outputs into a new dataframe and I want to stack these in rows per variable. See my new post [link](https://stackoverflow.com/questions/47891753/exporting-lm-summary-output-to-dataframe-including-na). Due to the NA's I cannot export all outputs in the same format. I found a way that in your LM code, you an include singular.ok = true to include the NA's (singularities) in the regressions. The weird thing is also that when I want summary output, I need print() instead of summary(). Can you help me with solving this? – PimM Dec 20 '17 at 08:55