2

I am trying to carry out a MANOVA. There are 7 dependent variables and a categorical independent variable representing 6 groups.

The data are available here: http://pastebin.com/fqXNjWtr

Click download above the text. I am reading the file with R like this (I think the name of the downloaded file should be the same for you; I'm using a Macintosh operating system):

> df <- read.csv("~/downloads/fqXNjWtr.txt", stringsAsFactors = F)
> str(df)

'data.frame':   244 obs. of  8 variables:
 $ var1              : num  0.3 0 0.312 0 0.643 ...
 $ var2              : num  0 0.125 0 0.375 0.0714 ...
 $ var3              : num  0 0.0625 0.0625 0 0.0714 ...
 $ var4              : num  0.2 0.3125 0.0625 0.0625 0 ...
 $ var5              : num  0.1 0.25 0.438 0.188 0 ...
 $ var6              : num  0.2 0.0625 0.125 0.0625 0.0714 ...
 $ var7              : num  0.2 0.188 0 0.312 0.143 ...
 $ cluster_assignment: int  1 4 2 6 1 4 3 3 4 6 ...

I am then creating the dependent variable, DV:

> df$DV <- as.matrix(df[, 1:7])

I am then carrying out the MANOVA:

> mv_out <- manova(DV ~ cluster_assignment, data = df)
Call:
   manova(DV ~ cluster_assignment, data = df)

Terms:
                cluster_assignment Residuals
resp 1                    5.160838  6.738524
resp 2                    3.384101  3.622020
resp 3                    0.000200  3.365565
resp 4                    0.065469  2.743549
resp 5                    0.889180  8.019733
resp 6                    0.442187  5.884827
resp 7                    3.133188  7.736993
Deg. of Freedom                  1       242

Residual standard errors: 0.1668686 0.1223398 0.1179292 0.1064752 0.1820423 0.1559406 0.1788045
Estimated effects may be unbalanced

When I then try the summary() function, I get this error:

> summary(mv_out)
Error in summary.manova(mv_out) : residuals have rank 6 < 7

Based on some other posts, this seems to suggest that there are not enough observations given the number of variables, or that some of the predictors may be multicollinear. But this doesn't seem to be the case with this data:

> cor(df[, 1:7)

            var1         var2        var3         var4        var5        var6       var7
var1  1.00000000 -0.417605243 -0.05274197 -0.118358341 -0.25617705  0.06089533 -0.4360312
var2 -0.41760524  1.000000000 -0.07181878  0.008873035 -0.29523300 -0.33954011  0.1958746
var3 -0.05274197 -0.071818782  1.00000000  0.131137673 -0.11624079 -0.14408909 -0.2951076
var4 -0.11835834  0.008873035  0.13113767  1.000000000 -0.14361455 -0.24308229 -0.1491373
var5 -0.25617705 -0.295233000 -0.11624079 -0.143614554  1.00000000 -0.03180183 -0.2383027
var6  0.06089533 -0.339540114 -0.14408909 -0.243082287 -0.03180183  1.00000000 -0.3215075
var7 -0.43603124  0.195874568 -0.29510761 -0.149137349 -0.23830275 -0.32150753  1.0000000

I'm puzzled about what may be going on.

Joshua Rosenberg
  • 4,014
  • 9
  • 34
  • 73

2 Answers2

5

You can resolve this error by setting the 'tol' parameter in ?summary.manova. df$DV fails the rank deficient test with the default tol=1e-7 because the rowSums are 1. This might not produce the results you intended though.

summary(mv_out,tol=0)
                       Df Pillai approx F num Df den Df Pr(>F)
df$cluster_assignment   1 1.2106  -193.79      7    236       
Residuals             242     
danielson
  • 1,029
  • 7
  • 8
  • According to `?summary.manova`, "[tolerance to be used in deciding if the residuals are rank-deficient] can be reduced but doing so will allow rather inaccurate results and it will normally be better to transform the responses to remove the high correlation." In this case, what type of transformation might be appropriate? Why is rowSums = 1 the issue? Thanks. – Joshua Rosenberg Sep 11 '16 at 14:42
  • 1
    Sorry, I could have explained that better. rowSums=1 implies that the columns of df$DV are not linearly independent, which is an assumption in manova. One column is always dependent on the other 6; for example, df$var7= 1-rowSums(df$DV[,1:6]) except for some small rounding errors. The estimates should be stable from `print(mv_out)` using your current code. For a better F-statistic, you can run the test in `summary.manova` with any 6 of the dependent variables. – danielson Sep 11 '16 at 18:58
3

DV´s are not full rank, as rowSums(df$DV) shows that row values add up to a constant value. As danielson pointed out, this violates MANOVA assumptions. This kind of data, which seems to follow a pattern of "parts-of-a-whole" structure is sometimes referred to as compositional data. You can get nice tools and learn more about them in the following website: http://www.compositionaldata.com/

However, for a brief solution, I recommend you to apply an isometric log-ratio transformation (for instance the ilr function in the compositions package in R) to the DV before building the MANOVA model. This should prevent the error message and the MANOVA assumption issues.

library(compositions)
mv_out <- manova(ilr(clo(DV)) ~ cluster_assignment, data = df)
summary(mv_out)

This should give you a fair solution.

Andrew M
  • 490
  • 5
  • 11
Xben
  • 31
  • 2