1

I am trying to perform a multivariate multiple regression on my data. I am trying to find out if there are any significant effects of any of the independent variables on any of the dependent variables.

I have two independent variables (expertise (three levels), and version (two levels)) and up to six dependent, continuous variables.

Admittedly, I am quite the noob in R, as well as statistics, but I can't seem to find the solution to the following code and accompanying error:

#MULTIVARIATE MULTIPLE REGRESSION --------
m1 <- lm(cbind(subjectDictCount, overlapCount, relativeOverlap, relativeSize, twoGramsCount, twoGramsOverlap) ~ version + expertise, data=manovalijst)
require(car)
summary(Anova(m1))

Error in eigen(qr.coef(SSPE.qr, x$SSPH), symmetric = FALSE) : infinite or missing values in 'x'

My complete dataset counts 67 rows (of which, 4 beginners, 40 experts, and 23 intermediates; and around 50/50 of versionA/versionB). My (sample of) data looks like this:

>> dput(manovalijst[c(1:4, 41:43, 65:67),])
structure(list(version = c("versionB", "versionA", "versionB", 
"versionB", "versionA", "versionB", "versionA", "versionB", "versionA", 
"versionA"), expertise = c("expert", "expert", "expert", "expert", 
"intermediate", "intermediate", "intermediate", "novice", "novice", 
"novice"), subjectDictCount = c(12, 53, 52, 33, 38, 27, 23, 40, 
23, 24), overlapCount = c(8, 47, 14, 23, 23, 16, 11, 13, 11, 
14), relativeOverlap = c(0.666666667, 0.886792453, 0.269230769, 
0.696969697, 0.605263158, 0.592592593, 0.47826087, 0.325, 0.47826087, 
0.583333333), relativeSize = c(0.184615385, 0.815384615, 0.8, 
0.507692308, 0.584615385, 0.415384615, 0.353846154, 0.615384615, 
0.353846154, 0.369230769), twoGramsCount = c(11, 52, 51, 32, 
37, 26, 22, 39, 22, 23), twoGramsOverlap = c(1, 29, 0, 9, 6, 
1, 1, 2, 0, 0)), row.names = c(NA, -10L), class = c("tbl_df", 
"tbl", "data.frame"))

I found in this question: Error in eigen(corr) : infinite or missing values in 'x' when making a 'Correlation matrix circles plot' some info that might help me, but I know that there are no NA values in my dataset. I read something about multicollinearity and how it might effect the outcome of this type of analysis. Indeed, there are perfect correlations between subjectDictCount-relativeSize, subjectDictCount-twoGramsCount, and relativeSize-twoGramsCount (because relativeSize and twoGramsCount are built up from subjectDictCount).

Now, what steps do I take?

Do I use a completely different test? Do I test the dependent variables separately from each other? Is it because assumptions have not been met (I thought they did).

I thought I was using the right test, but now I am starting to doubt that, as well as the fact that any explanations of statistics involve a lot of math and strange signs, which are definitely not my forte.

Any help would be appreciated.

Thank you!

EDIT: I forgot to mention I do get partial results using abovementioned code:

Type II MANOVA Tests:

Sum of squares and products for error:
                 subjectDictCount overlapCount relativeOverlap relativeSize twoGramsCount twoGramsOverlap
subjectDictCount       6712.34026   3738.44675     -13.4031133  103.2667732    6712.34026      1592.54416
overlapCount           3738.44675   4216.51558      58.7752987   57.5145654    3738.44675      2087.59805
relativeOverlap         -13.40311     58.77530       2.3526341   -0.2062017     -13.40311        30.77520
relativeSize            103.26677     57.51457      -0.2062017    1.5887196     103.26677        24.50068
twoGramsCount          6712.34026   3738.44675     -13.4031133  103.2667732    6712.34026      1592.54416
twoGramsOverlap        1592.54416   2087.59805      30.7752046   24.5006793    1592.54416      1429.53149

------------------------------------------
 
Term: version 

Sum of squares and products for the hypothesis:
                 subjectDictCount overlapCount relativeOverlap relativeSize twoGramsCount twoGramsOverlap
subjectDictCount        9.1727837   17.9271598     0.199953482  0.141119749     9.1727837      16.6558442
overlapCount           17.9271598   35.0365895     0.390786278  0.275802458    17.9271598      32.5519481
relativeOverlap         0.1999535    0.3907863     0.004358698  0.003076207     0.1999535       0.3630734
relativeSize            0.1411197    0.2758025     0.003076207  0.002171073     0.1411197       0.2562438
twoGramsCount           9.1727837   17.9271598     0.199953482  0.141119749     9.1727837      16.6558442
twoGramsOverlap        16.6558442   32.5519481     0.363073427  0.256243756    16.6558442      30.2435065

EDIT: I noticed when I take out the variable of relativeSize, the error does not occur. However, I still do not know why.

0 Answers0