How do I select two factors and perform regression

Question

I have a dataset like this:

library(tidyverse)
set.seed(123)
Data <- data.frame(
        X = sample(c("A", "B", "C"), 20, replace = TRUE),
    Y = sample(1:20)
)
Data%>%
    arrange(X)

I would like to run series of regressions such that DV is Y but independent variables for each regression are factors taken two at a time. For instance, A&B, A&C,B&A,B&C Thanks for your help.

I don't understand what type of model you are trying to fit here. Are you trying to subset the data for each regression? It would be better to include a proper [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input (that doesn't include "...") and desired output that can be used to test and verify possible solutions. Do you expect the results from A&B to be different from B&A? — MrFlick, Apr 16 '20 at 21:51
I have edited the question for a reproducible example. A&B may not be different to B&A but I will eventually need a count. Thanks for your help. — queserasera, Apr 16 '20 at 22:08
I still don't understand what you mean by "independent variables for each regression are factors taken two at a time". The only variable you have beside Y is X. How do you want to tun the values of `X` into variables? Do you want to subset and reshape each iteration? The sample input is reproducible but it's still not clear what your desired output is. What are the values you expect for this input? — MrFlick, Apr 16 '20 at 22:23
Ok, first I want to run a regression where the factor levels A and B are predictors and the the corresponding Y values are DV, then I want to run another regression with factor levels A and C as predictors and their corresponding Y values as DV and so on. — queserasera, Apr 16 '20 at 22:25
Data in the example has only one factor with 3 levels A, B, C. Do you need perform regression of Y vs X with two out of three levels, I mean 3 regressions with subsets of Data: Data [ Data$X != levels( Data$X )[ i ], ] for i = 1, 2, and 3? — Marcelo Fernando Befumo, Apr 16 '20 at 23:47

Obim · Accepted Answer · 2020-04-17T16:55:55.340

For linear regression:

> combos <- as.data.frame(x = combn(x = c("A", "B", "C"), m = 2))
names(combos) <- sapply(
  X = 1:3, 
  FUN = function(x){paste(combos[,x], collapse = "")}
)

> fit.list <- list()
> for (combo in names(combos)){
  fit.list[[combo]] <- subset(Data, X %in% combos[,combo]) %>% 
    lm(formula = .$Y ~ .$X, data = .)
}

> fit.list
$AB

Call:
lm(formula = .$Y ~ .$X, data = .)

Coefficients:
(Intercept)         .$XB  
       10.4          3.6  


$AC

Call:
lm(formula = .$Y ~ .$X, data = .)

Coefficients:
(Intercept)         .$XC  
       10.4         -2.9  


$BC

Call:
lm(formula = .$Y ~ .$X, data = .)

Coefficients:
(Intercept)         .$XC  
       14.0         -6.5

EDIT For adding a covariate (e.g. Z), one way is to add as a new column in the data.frame, then add the column name to the model:

> set.seed(123)
> Data <- data.frame(
  X = sample(c("A", "B", "C"), 20, replace = TRUE),
  Y = sample(1:20),
  Z = sample(1:20)
)

> fit.list <- list()
> for (combo in names(combos)){
  fit.list[[combo]] <- subset(Data, X %in% combos[,combo]) %>% 
    lm(formula = .$Y ~ .$X + .$Z, data = .)
}
> fit.list
$AB

Call:
lm(formula = .$Y ~ .$X + .$Z, data = .)

Coefficients:
(Intercept)         .$XB          .$Z  
     3.6697       3.1921       0.5099  


$AC

Call:
lm(formula = .$Y ~ .$X + .$Z, data = .)

Coefficients:
(Intercept)         .$XC          .$Z  
     7.9306      -1.5063       0.1871  


$BC

Call:
lm(formula = .$Y ~ .$X + .$Z, data = .)

Coefficients:
(Intercept)         .$XC          .$Z  
   14.89888     -7.02970     -0.06421

Thanks, that works. If I am to add some covariate like X1=sample(1:20) to the model, how would you modify the code? — queserasera, Apr 17 '20 at 16:42
Thanks so much. Any idea why the output is not showing p values? — queserasera, Apr 17 '20 at 17:30
You can use the summary() function to obtain the statistics of each fit. `summary(fit.list$AB)` — Obim, Apr 17 '20 at 19:24

How do I select two factors and perform regression

1 Answers1