1

I have a single data frame consisting of x unique combinations of region and channel. I need to create a distinct regression model for each the x combinations using some sort of a loop.

region  channel         date         trials    spend    
EMEA    display       2015-01-01       62     17875.27   
APAC    banner        2015-01-01       65     18140.93

Something to the effect of

i=1
j=1
for r in region{
   for ch in channel{
       df1 = df[df$region == r & df$channel == ch, ]
       model[[i,j]] = lm(trials ~ spend, data = df1)
                      j = j+1}
                i = i+1 }

If someone also knew a way of storing a unique identifier such as region+channel to help identify the regression models that would be very helpful too.

Marcus
  • 95
  • 9
  • 2
    What you've got there looks like a good start; can you tell us where it breaks down? Can you please include data that will provide us with a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) ? – Ben Bolker May 16 '16 at 19:08
  • using this code i get the following error `Error in model[[i, j]] = lm(trials ~ spend, data = df1): [[ ]] subscript out of bounds ` – Marcus May 16 '16 at 19:13

2 Answers2

3

A plyr solution:

set.seed(1)
d <- data.frame(region = letters[1:2],
                channel = LETTERS[3:6],
                trials = runif(20),
                spend = runif(20))

Make a list of results (i.e. split d by region and channel, run lm on each chunk with the specified formula, return results as a list)

library(plyr)
res <- dlply(d,c("region","channel"), lm,
             formula=trials~spend)

Extract coefficients as a data frame:

ldply(res,coef)
##   region channel (Intercept)      spend
## 1      a       C   0.3359747  0.2444105
## 2      a       E   0.7767959 -0.3745419
## 3      b       D   0.7409942 -0.8084751
## 4      b       F   1.0797439 -1.0872158

Note that the result has your desired region/channel identifiers in it ...

Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
  • I appreciate the feedback, but SO policy is to avoid using comments for "thank you"s; recommended channel is to upvote if the answer is useful, and click the check-mark if it answers your question/solves your problem – Ben Bolker May 16 '16 at 20:29
2

Use split the data into 2 column combinations as list, then run lm within loop - lapply for each subset of data, see this example:

# dummy data
set.seed(1)
d <- data.frame(region = letters[1:2],
                channel = LETTERS[3:6],
                trials = runif(20),
                spend = runif(20))

# split by 2 column combo
dSplit <- split(d, paste(d$region, d$channel, sep = "_"))

# run lm for each subset
res <- lapply(dSplit, lm, formula = trials ~ spend)

# check names
names(res)
# [1] "a_C" "a_E" "b_D" "b_F"

# lm result for selected combo "a_C"
res$a_C
# Call:
#   lm(formula = trials ~ spend, data = i)
# 
# Coefficients:
#   (Intercept)        spend  
# 0.3360       0.2444  
zx8754
  • 52,746
  • 12
  • 114
  • 209