3

I just start using R for statistical analysis and I am still learning. I have an issue with creating loops in R. I have the following case and I was wondering if any one can help me with it. For me, it seems impossible but for some of you, it is just a piece of cake. I have a dataset for different firms across different years. for each firm I have different observations for the same year and I need to run the following regression for each firm for each year (I have more than 1000 firms and it seems impossible to run the regression for each firm separately) : Ri = α0 + β1Rm + β2Rz + Ɛ

the data I have looks like the following example:
Year   Firm    Ri    Rm    Rz
2009   A       30    55    85
2009   A       11    55    85
2009   A       1     55    85
2010   A       7     55    85
2010   A       15    55    85
2011   A       20    55    85
2011   A       3.5   55    85
2011   A       8     55    85
2009   B       24    55    85
2009   B       30    55    85
2009   B       25    55    85
2010   B       5.2   55    85
2010   B       11.8  55    85
2011   B       78    55    85
2011   B       90    55    85
2011   B       57    55    85

I need to obtain B1, B2 and the error term Ɛ for each firm for each year. just like this:

Year Firm       B1    B2    Ɛ
2009   A       0.30  0.55  0.85
2010   A       0.11  0.55  0.85
2011   A       0.1   0.55  0.85
2009   B       0.7   0.55  0.85
2010   B       0.15  0.55  0.85
2011   B       0.20  0.55  0.85

Thank you in advance for your help

hbtf.1046
  • 1,377
  • 2
  • 9
  • 8

3 Answers3

2

You could do this using loops and subset, but you could do also use mapply, like this. (I've made a larger dataset to be able to demonstrate properly).

Year <- sort(rep.int(2009:2011, 30))
Firm <- gl(n = 2, k = 15, length = 90, labels = c('A', 'B'))
dta <- data.frame(Year, Firm, Ri = rnorm(90, 5, 2), Rm = rnorm(90, 2, 1), Rz = rnorm(90, -1, 0.5))

filt <- expand.grid(unique(dta$Year), unique(dta$Firm))

op <- mapply(function(x, y) lm(Ri ~ Rm + Rz, data = dta, subset = Year == x & Firm == y), 
             filt$Var1, filt$Var2, SIMPLIFY = FALSE)

sapply(op,coef)
David_B
  • 926
  • 5
  • 7
1

You can loop through each Firm and Year to create a unique lm for each like so:

#Assume your data frame is named df
#Convert Firm and Year to factor variables
df$Firm <- as.factor(df$Firm)
df$Year <- as.factor(df$Year)

#Loop through each level in Firm and Year and generate lm for each
for(i in levels(df$Firm)){
  for(j in levels(df$Year)){
    assign(paste0('lm', i, j), lm(Ri~Rm+Rz, data=df[df$Firm==i & df$Y==j,]))
  }
}
Gaurav Bansal
  • 5,221
  • 14
  • 45
  • 91
1

Using subset = and two for loops.

for(i in unique(df$Year)) {
  for(j in unique(df$Firm)) {
     print(i)
     print(j)
     print(lm(Ri ~ Rm + Rz, data = df, subset = df$Year==i & df$Firm ==j))
  }
}

Per your new output:

m <- data.frame(matrix(ncol = 5, nrow = length(unique(df$Year))*length(unique(df$Firm))))
l = 0
for(i in unique(df$Year)) {
  for(j in unique(df$Firm)) {
    l = l + 1
    mod<-lm(Ri ~ Rm + Rz, data = df, subset = df$Year==i & df$Firm ==j)
    m[l,] <- c(i,
               as.character(j), 
               mod$coefficients[2],
               mod$coefficients[3],
               summary(mod)$sigma)
  }
}
names(m) <- c("Year", "Firm", "B1", "B2", "e")
Andrew Taylor
  • 3,438
  • 1
  • 26
  • 47