-1

The whole data include 5 columns, which are named A, B, C, D, and Portfolio. I will run the linear regression model for each portfolio. Therefore, the whole data is divided into subset data.Then, run the regression model and check their summaries. Data frame looks like the table below,

      A    B    C    D    Portfolio
1           ...               11
2           ...               22
3           ...               13
4           ...               11
5           ...               21
6           ...               21
7           ...               23
8           ...               12
9           ...               11
10          ...               12 
11          ...               22
...                       

The code I did presents as below,

Portfolio_11<-subset(df, Portfolio==11)
Portfolio_12<-subset(df, Portfolio==12)
Portfolio_13<-subset(df, Portfolio==13)
Portfolio_21<-subset(df, Portfolio==21)
Portfolio_22<-subset(df, Portfolio==22)
Portfolio_23<-subset(df, Portfolio==23)

Reg_11<-lm(A ~ B + C + D, data=Portfolio_11)
Reg_12<-lm(A ~ B + C + D, data=Portfolio_12)
Reg_13<-lm(A ~ B + C + D, data=Portfolio_13)
Reg_21<-lm(A ~ B + C + D, data=Portfolio_21)
Reg_22<-lm(A ~ B + C + D, data=Portfolio_22)
Reg_23<-lm(A ~ B + C + D, data=Portfolio_23)

summary(Reg_11)
summary(Reg_12)
summary(Reg_13)
summary(Reg_21)
summary(Reg_22)
summary(Reg_23)

I try to simplify R code by using loop function. Like,

for (i=1:3, j=1:3){
Portfolio_ij<-subset(df, Portfolio==ij)
Reg_ij<-lm(A ~ B + C + D, data=Portfolio_ij)
summary(Reg_ij)
}

However, I am a starter in r and don't really understand the rule of loop function. Therefore, I want to learn it. Thank you so much.

Weber Chen
  • 19
  • 3
  • 6
  • could you please turn it into a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example?rq=1) ? – Vincent Bonhomme Nov 27 '16 at 13:06

4 Answers4

2

We can use one of the group by functions

library(data.table)
dtSummary <- setDT(df)[,  list(list(summary(lm(A ~ B + C + D)))), by = Portfolio]
dtSummary$V1
akrun
  • 874,273
  • 37
  • 540
  • 662
1

To make life easier for yourself, use one of the R packages for data munging. Akrun has already mentioned data.table; this is also a classic use case for dplyr's do:

library(dplyr)
df %>%
    group_by(Portfolio) %>%
    do(smry=summary(lm(A ~ B + C + D, data=.)))
Hong Ooi
  • 56,353
  • 13
  • 134
  • 187
1

This is a classic case for the split-apply-combine approach, or at least the split-apply part, since it's not clear what you want to do with the output. Here's one way to do that in base R, returning the results in a list called Summaries:

Summaries <- lapply(split(df, df$Portfolio), function(i) summary(lm(A ~ B + C + D, data = i)))

Working out from the inside, you:

  1. Use split to break the original data into a list composed of the desired subsets, defined here by unique values of DF$Portfolio.
  2. use lapply to iterate the modeling and model summarizing functions over the elements of the list created in step 1.

The result is a list (Summaries), the ith element of which corresponds to the ith subset of df$Portfolio. Conveniently, the list elements will have names that correspond to the unique values of df$Portfolio, so you can inspect them with Summaries[["21"]], for example. Or, if you just want to see the results in your terminal or markdown or whatever, drop the Summaries <- part.

ulfelder
  • 5,305
  • 1
  • 22
  • 40
0

Using base R, you could try:

#creates your combinations
subs <- apply(expand.grid(1:3, 1:2), 1, function(x) as.numeric(paste0(x, collapse="")))
# loop along these combinations. Note the print.
for (i in subs)
   print(summary(lm(A ~ B + C + D, data=subset(df, Portfolio==i))))

But as asked in the comments, a reproducible example would help.

Here is one built dataset:

# same as above
subs <- apply(expand.grid(1:3, 1:2), 1, function(x) as.numeric(paste0(x, collapse="")))

# here we create the dataset    
n=50 # we want 50 rows
set.seed(1) # for the sake of reproducibility
df <- data.frame(A=rnorm(n), B=rnorm(n), C=rnorm(n), D=rnorm(n), Portfolio=sample(subs, n, replace=TRUE))

# now we can apply the loop:
for (i in subs){
  cat(rep("*", 20), "\nlm for Portfolio =", i, '\n')  # a cheap console displayer
  print(summary(lm(A ~ B + C + D, data=subset(df, Portfolio==i))))
}

But as others answered both data.table and dplyr packages result in a more straightforward/generic syntax compared to base R.

Vincent Bonhomme
  • 7,235
  • 2
  • 27
  • 38