0

I have a dataset like this:

library(tidyverse)
set.seed(123)
Data <- data.frame(
        X = sample(c("A", "B", "C"), 20, replace = TRUE),
    Y = sample(1:20)
)
Data%>%
    arrange(X)

I would like to run series of regressions such that DV is Y but independent variables for each regression are factors taken two at a time. For instance, A&B, A&C,B&A,B&C Thanks for your help.

  • I don't understand what type of model you are trying to fit here. Are you trying to subset the data for each regression? It would be better to include a proper [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input (that doesn't include "...") and desired output that can be used to test and verify possible solutions. Do you expect the results from A&B to be different from B&A? – MrFlick Apr 16 '20 at 21:51
  • I have edited the question for a reproducible example. A&B may not be different to B&A but I will eventually need a count. Thanks for your help. – queserasera Apr 16 '20 at 22:08
  • I still don't understand what you mean by "independent variables for each regression are factors taken two at a time". The only variable you have beside Y is X. How do you want to tun the values of `X` into variables? Do you want to subset and reshape each iteration? The sample input is reproducible but it's still not clear what your desired output is. What are the values you expect for this input? – MrFlick Apr 16 '20 at 22:23
  • Ok, first I want to run a regression where the factor levels A and B are predictors and the the corresponding Y values are DV, then I want to run another regression with factor levels A and C as predictors and their corresponding Y values as DV and so on. – queserasera Apr 16 '20 at 22:25
  • Data in the example has only one factor with 3 levels A, B, C. Do you need perform regression of Y vs X with two out of three levels, I mean 3 regressions with subsets of Data: Data [ Data$X != levels( Data$X )[ i ], ] for i = 1, 2, and 3? – Marcelo Fernando Befumo Apr 16 '20 at 23:47

1 Answers1

2

For linear regression:

> combos <- as.data.frame(x = combn(x = c("A", "B", "C"), m = 2))
names(combos) <- sapply(
  X = 1:3, 
  FUN = function(x){paste(combos[,x], collapse = "")}
)

> fit.list <- list()
> for (combo in names(combos)){
  fit.list[[combo]] <- subset(Data, X %in% combos[,combo]) %>% 
    lm(formula = .$Y ~ .$X, data = .)
}

> fit.list
$AB

Call:
lm(formula = .$Y ~ .$X, data = .)

Coefficients:
(Intercept)         .$XB  
       10.4          3.6  


$AC

Call:
lm(formula = .$Y ~ .$X, data = .)

Coefficients:
(Intercept)         .$XC  
       10.4         -2.9  


$BC

Call:
lm(formula = .$Y ~ .$X, data = .)

Coefficients:
(Intercept)         .$XC  
       14.0         -6.5 

EDIT For adding a covariate (e.g. Z), one way is to add as a new column in the data.frame, then add the column name to the model:

> set.seed(123)
> Data <- data.frame(
  X = sample(c("A", "B", "C"), 20, replace = TRUE),
  Y = sample(1:20),
  Z = sample(1:20)
)

> fit.list <- list()
> for (combo in names(combos)){
  fit.list[[combo]] <- subset(Data, X %in% combos[,combo]) %>% 
    lm(formula = .$Y ~ .$X + .$Z, data = .)
}
> fit.list
$AB

Call:
lm(formula = .$Y ~ .$X + .$Z, data = .)

Coefficients:
(Intercept)         .$XB          .$Z  
     3.6697       3.1921       0.5099  


$AC

Call:
lm(formula = .$Y ~ .$X + .$Z, data = .)

Coefficients:
(Intercept)         .$XC          .$Z  
     7.9306      -1.5063       0.1871  


$BC

Call:
lm(formula = .$Y ~ .$X + .$Z, data = .)

Coefficients:
(Intercept)         .$XC          .$Z  
   14.89888     -7.02970     -0.06421 
Obim
  • 136
  • 5