5

I would like to calculate the correlation between latent and observed variables using lavaan in R.

Here's a simple example of what I'm trying to do. We have some data and a lavaan model.

data(bfi)
names(bfi) <- tolower(names(bfi))
mod <- "
 agree =~ a1 + a2 + a3 + a4 + a5
 consc =~ c1 + c2 + c3 + c4 + c5
 age ~~ agree 
 age ~~ consc
"
lavaan::cfa(mod, bfi)

agree is a latent variable with 5 indicators. Age is an observed variable and I want to get the correlation between the observed variable age and the latent variable agree. The general way of specify covariance in lavaan is by putting ~~ in between the variables. But this doesn't seem to work when one of the variables is observed.

When I run the above, I get the following error:

Error in lav_model(lavpartable = lavpartable, representation = lavoptions$representation,  : 
  lavaan ERROR: parameter is not defined: agree ~~ age

In other SEM software, such as Amos, you'd just draw a double headed arrow between the latent and observed variable.

How do you include correlations between latent and observed variables in lavaan?

Jeromy Anglim
  • 33,939
  • 30
  • 115
  • 173

2 Answers2

9

One workaround that seems to work is to trick lavaan into thinking an observed variable is a factor:

data(bfi)
names(bfi) <- tolower(names(bfi))
mod <- "
 agree =~ a1 + a2 + a3 + a4 + a5
 consc =~ c1 + c2 + c3 + c4 + c5
 agefac =~ age
 agefac ~~ agree
 agefac ~~ consc
"
lavaan::cfa(mod, bfi)

I.e., agefac is a latent version of age but because age is the only indicator and the coefficient of that indicator is constrained to 1, it will be the same thing as the observed age variable. You can then use this quasi-latent variable to correlate with actual latent variables.

Jeromy Anglim
  • 33,939
  • 30
  • 115
  • 173
1

If the model isn't going to change, you can regress your observed variable on the latent. The resulting standardised regression coefficient will be equivalent to a correlation between the latent and a "quasi-latent" as described by @Jeromy. For example:

mod <- "
  agree =~ a1 + a2 + a3 + a4 + a5
  age ~ agree  # regression instead of correlation
"
lavaan::cfa(mod, bfi) %>% summary(standardized = TRUE)

The standardized regression coefficient of age on agree will be the same whether you run this or the model described by @Jeromy. Note, however, that the unstandardized coefficient will not be the same.

Simon Jackson
  • 3,134
  • 15
  • 24
  • Thanks for that. Alas, in my attempt to make my example simple, I probably haven't communicated the broader use case. i.e., when you want to correlate multiple latent variables with an observed variable. I've edited to try to make this clearer. I imagine the above would only be true when you have one latent mapped with one observed. – Jeromy Anglim Jul 25 '16 at 06:13
  • Ah, I see. Yes, regressing the observed variable on multiple latents will not give you what you want. Best to stick with your solution :) – Simon Jackson Jul 25 '16 at 06:25