3

I have created a questionnaire. This questionnaire is composed of four sub-scales measuring 4 different components of my variable of interest. Each subscale is composed of 3 items. Each item is a 6-point scale (then responses for each item are comprised between 1 and 6).

Here is a sample of my data, each row is a subject :

> dput(DF[1:10, 7:18 ]) 
structure(list(I1 = c(3, 6, 6, 4, 5, 5, 3, 3, 5, 4), I2 = c(3, 
5, 5, 6, 4, 5, 2, 5, 5, 4), I3 = c(1, 4, 2, 3, 3, 4, 4, 1, 5, 
2), I4 = c(5, 6, 6, 6, 5, 6, 6, 6, 6, 6), I5 = c(5, 6, 5, 5, 
6, 6, 5, 6, 5, 5), I6 = c(4, 6, 6, 6, 5, 5, 6, 4, 5, 4), I7 = c(3, 
6, 5, 6, 4, 4, 3, 5, 3, 4), I8 = c(4, 6, 5, 5, 4, 4, 3, 5, 3, 
5), I9 = c(4, 6, 4, 4, 5, 5, 5, 4, 4, 3), I10 = c(2, 4, 5, 6, 
3, 2, 4, 1, 2, 4), I11 = c(3, 3, 4, 6, 4, 6, 5, 5, 2, 3), I12 = c(3, 
6, 6, 6, 5, 4, 4, 4, 5, 5)), row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame"))

217 participants fulfilled this questionnaire (no missing value) and I want to test if my data support my model with a CFA.

Here is my code :

library(lavaan)

model <- "
Factor1 =~ I1 + I2 + I3
Factor2 =~ I4 + I5 + I6
Factor3 =~ I7 + I8 + I9
Factor4 =~ I10 + I11 + I12
"

fit <- cfa(model, data = DF)
summary(fit, fit.measures = TRUE, standardized = TRUE)

But when I run it, I have the following error and I can't understand why. Here is the error message :

lavaan WARNING: the optimizer warns that a solution has NOT been found!
lavaan WARNING: the optimizer warns that a solution has NOT been found!
lavaan WARNING: Could not compute standard errors! The information matrix could not be inverted. This may be a symptom that the model is not identified.
lavaan WARNING: some estimated ov variances are negative
lavaan WARNING: covariance matrix of latent variables
is not positive definite; use lavInspect(fit, "cov.lv") to investigate.

Here what I have with lavInspect:

> lavInspect(fit, "cov.lv")
        Factr1   Factr2   Factr3   Factr4  
Factor1 7797.062                           
Factor2    0.248    0.451                  
Factor3    0.215    0.182    0.289         
Factor4   -0.254   -0.159    0.280 9883.238

Knowing that this huge cov for Factor 1 and Factor 4 could be explained by very high variances for I1 ( -7795.413) and I10 (-9881.204) displayed by lavaan, but if I ask directly R for var(DF$I1) and var(DF$I10), the result is very different.

Variances:
                   Estimate   Std.Err  z-value  P(>|z|)   Std.lv   Std.all 
   .I1             -7795.413       NA                   -7795.413 -4729.827
   .I2                 1.684       NA                       1.684     1.000
   .I3                 1.535       NA                       1.535     1.000
   .I4                 0.807       NA                       0.807     0.641
   .I6                 1.859       NA                       1.859     0.884
   .I7                 1.370       NA                       1.370     0.826
   .I8                 1.201       NA                       1.201     0.832
   .I9                 1.681       NA                       1.681     0.950
   .I10            -9881.204       NA                   -9881.204 -4859.350
   .I11                2.215       NA                       2.215     1.000
   .I12                0.784       NA                       0.784     1.000


> var(DF$I1)
[1] 1.683052
> var(DF$I10)
[1] 1.966163



 

Does any one know why it is not working? Is it because my model doesn't fit enough to my data?

Thank you in advance!

jay.sf
  • 60,139
  • 8
  • 53
  • 110
Lea_c
  • 41
  • 3
  • Did you use `lavInspect(fit, "cov.lv")`? What is the output? Can you show the data? – Tom Sep 28 '20 at 06:38
  • Thank you for your answer Tom, I will put those data on my post. – Lea_c Sep 28 '20 at 08:02
  • maybe the 4 factor structure indeed is not underlying the data. Did you inspect `cor(DF)` (actually you could have provided us with the covariance matrix rather than the the raw data); e.g., in the sample data `I11` and `I12` show a negative correlation. Also, I think it is an artefact of estimation order that `I1` and `I10` have large negative values; if you place another item on the first place for the respective factors, I suspect that those turn out to be largely negative. However, this question might more apropriately be adressed at [stats.stackexchange](https://stats.stackexchange.com/) – Tom Sep 28 '20 at 14:47

1 Answers1

1

Have a look at this lavaan discussion. Having factor variances in the thousands and others lower than 1 tends to be problematic for the estimation process.

I assume that some variables (esp. those of the factors 1 and 4) range from 1 to say 50 whereas others might range from 1 to 5. If this is the case, I suggest that you transform your variables to the same margin prior to the CFA estimation, e.g.,

vars <- c("I1", "I2", "I3", "I10", "I11", "I12")
DF[, vars] <- DF[, vars] / 10
Tom
  • 934
  • 12
  • 20
  • I have tried to read all the lavaan related discussion before to share my question, comprising this one. But it did not help me. All variables range from 1 to 6, there is no significant difference between items variances.. – Lea_c Sep 28 '20 at 10:19
  • I refered to that specific discussion, because in the second posting Terrence Jorgensen consicely describes why it might be a bad idea to have variances that are at least a thousend times larger than others. – Tom Sep 28 '20 at 10:23
  • 1
    Oh I see. Actually, variance of item 1 and item 10 are huge in lavaan ( -7795.413 and -9881.204, respectively), but this is an error because var(DF$I1) = 1.683052 and var(DF$I10) = 1.966163. – Lea_c Sep 28 '20 at 10:26
  • I added a sample to my question (I didn't know how to share it) – Lea_c Sep 28 '20 at 10:44
  • 2
    using `dput` on your sample data you can get an ASCII text representation that you can copy and paste into your question; see also [How to make a great R reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). – Tom Sep 28 '20 at 11:45