1

I have two data frames:

x = data.frame(Var1= c("A", "B", "C", "D","E"),Var2=c("F","G","H","I","J"),
    Value= c(11, 12, 13, 14,18))

y = data.frame(A= c(11, 12, 13, 14,18), B= c(15, 16, 17, 14,18),C= c(17, 22, 23, 24,18), D= c(11, 12, 13, 34,18),E= c(11, 5, 13, 55,18),  F= c(8, 12, 13, 14,18),G= c(7, 5, 13, 14,18),
    H= c(8, 12, 13, 14,18), I= c(9, 5, 13, 14,18), J= c(11, 12, 13, 14,18))

Var3 <- rep("time", each=length(x$Var1))

x=cbind(x,Var3)

time=seq(1:length(y[,1]))
y=cbind(y,time)

> x
  Var1 Var2 Value Var3
1    A    F    11 time
2    B    G    12 time
3    C    H    13 time
4    D    I    14 time
5    E    J    18 time
> y
   A  B  C  D  E  F  G  H  I  J time
1 11 15 17 11 11  8  7  8  9 11    1
2 12 16 22 12  5 12  5 12  5 12    2
3 13 17 23 13 13 13 13 13 13 13    3
4 14 14 24 34 55 14 14 14 14 14    4
5 18 18 18 18 18 18 18 18 18 18    5

Looking at x DF, I have variable A and F as the first row. I want to select these two variables in y DF and implement a simple regression: lm(A ~ F, data = y), and save the result in the first position of a list. I will do the same with the second row of x DF implementing a regression lm(B ~ G, data = y).

How could I match variables names in x to data in y for a regression?


Revised question: how about a more complicated regression Var1 ~ Var2 + Var3?

Zheyuan Li
  • 71,365
  • 17
  • 180
  • 248
Laura
  • 675
  • 10
  • 32

1 Answers1

1
x = data.frame(Var1= c("A", "B", "C", "D","E"),
               Var2=c("F","G","H","I","J"),
               Value= c(11, 12, 13, 14,18))

y = data.frame(A= c(11, 12, 13, 14,18),
               B= c(15, 16, 17, 14,18),
               C= c(17, 22, 23, 24,18),
               D= c(11, 12, 13, 34,18),
               E= c(11, 5, 13, 55,18),
               F= c(8, 12, 13, 14,18),
               G= c(7, 5, 13, 14,18),
               H= c(8, 12, 13, 14,18), 
               I= c(9, 5, 13, 14,18),
               J= c(11, 12, 13, 14,18))

We can use

fitmodel <- function (RHS, LHS) do.call("lm", list(formula = reformulate(RHS, LHS),
                                              data = quote(y)))

modList <- Map(fitmodel, as.character(x$Var2), as.character(x$Var1))

modList[[1]]  ## for example
#Call:
#lm(formula = A ~ F, data = y)
#
#Coefficients:
#(Intercept)            F  
#     4.3500       0.7115  

Remarks:

  1. The use of do.call is to ensure that reformulate is evaluated when passed to lm. This is desired as it allows functions like update to work correctly on the model object. See Showing string in formula and not as variable in lm fit. For a comparison:

    oo <- Map(function (RHS, LHS) lm(reformulate(RHS, LHS), data = y),
              as.character(x$Var2), as.character(x$Var1))
    oo[[1]]
    #Call:
    #lm(formula = reformulate(RHS, LHS), data = y)
    #
    #Coefficients:
    #(Intercept)            F  
    #     4.3500       0.7115  
    
  2. The as.character on x$Var1 and x$Var2 is necessary, as these two variables are currently "factor" variables not strings and reformulate can't use them. If you put stringsAsFactors = FALSE in data.frame when you build your x, there is no such issue.

It works for you? It's not suppose to have a "for" loop?

The Map function hides that "for" loop. It is a wrapper of the mapply function. The *apply family functions in R are a syntactic sugar.


Update on your revised question

Your original question is constructs a model formula as Var1 ~ Var2.

Your new question wants Var1 ~ Var2 + Var3.

x$Var3 <- rep("time", each=length(x$Var1))
y$time <- seq(1:length(y[,1]))

## collect multiple RHS variables (using concatenation function `c`)
RHS <- Map(base::c, as.character(x$Var2), as.character(x$Var3))
#str(RHS)
#List of 5  ## oh this list has names! annoying!!
# $ F: chr [1:2] "F" "time"
# $ G: chr [1:2] "G" "time"
# $ H: chr [1:2] "H" "time"
# $ I: chr [1:2] "I" "time"
# $ J: chr [1:2] "J" "time"
LHS <- as.character(x$Var1)
modList <- Map(fitmodel, RHS, LHS)  ## `fitmodel` function unchanged
modList[[1]]  ## for example
#Call:
#lm(formula = A ~ F + time, data = y)
#
#Coefficients:
#(Intercept)            F         time  
#        5.6          0.5          0.5  
Community
  • 1
  • 1
Zheyuan Li
  • 71,365
  • 17
  • 180
  • 248