4

My variables are measured on a randomized block with a subsampling design where my treatments are the 23 Accesion. I have 3 complete blocks and 6 samples per block. The example dataframe has 4 response variables (LH, REN, FTT, DFR), Accesion (treatment), Bloque (Block number) and Plot (that is the variable accounting for the subsampling). The head of the data is:

  Plot Accesion Bloque   LH  REN FTT DFR
1  221       22      1 20.6 1127  23  88
2  221       22      1 20.5 1638  20  88
3  221       22      1 24.5 1319  16  88
4  221       22      1 21.4  960  17  88
5  221       22      1 25.7 1469  18  88
6  221       22      1 25.8 1658  21  88

So, the data is non-normal and heteroscedastic for almost all of the 100 response variables after all types of transformations (log, boxcox, power, etc). Most of the variables show a chi-squared or Poisson-like distribution with different variance for each Accesion.

Histogram for FTT

So far I've beeng working on running a generalized linear model-effect with Poisson using the glmer() on the FTT as a response variable. I am using this code:

FTTglme = glmer(FTT ~ Accesion + Bloque + (1|Plot), data = Lyc,   
family=poisson(link="identity"))

The residuals are non-normal according to the shapiro.test(). That, I assume, is because of the heteroscedasticity observed in the residuals. As a boxplot of the residuals by Accesion, shows the difference of variances:

Boxplot of Residuals by Accesion

The heteroscedasticity is expected between plant populations, but I know it can be modelled inside the glme. The code that I should add, as I have investigated already, is:

vf <- varIdent(form=~Accesion)

FTTglme = glmer(FTT ~ Accesion + Bloque + (1|Plot), data = Lyc,   
family=poisson(link="identity"), weights = vf)

I want the different variances to account for each Accesion category. But I keep getting the error:

Error in model.frame.default(data = Lyc, weights = varIdent(form = ~Accesion),  : 
  variable lengths differ (found for '(weights)')

Does anyone know how to account for the differences of variance between the Accesions inside the glmer()?

Any other suggestions to analyze the data is also welcome.

vestland
  • 55,229
  • 37
  • 187
  • 305

1 Answers1

1

I've already solved the issue. In glmer() the weights argument correspond to a vector of the same length that the original. To model the heteroscedaticity I generated a variance dataframe:

VAR<-aggregate(Lyc[,6],by=list(Lyc$Accesion), var)
colnames(VAR)<-c("Accesion", "Var")

That yields a dataframe Treatment/Variance, the head of it is:

 Accesion       Var
1       22  4.369281
2       23 16.251634
3       24 13.911765
4       25 15.404412
5       26 15.895833
6       27 44.838095

Then I create a new dataframe merging both dataframes

Lyc2<-merge(VAR, Lyc, by="Accesion")

And as I saw by doing it wrong 1 time, is needed to use the inverse variance, in order to correct the variance.

Lyc2$Var<-(1/Lyc2$Var)

After all the data steps, i run the code:

FTTglme = glmer(FTT ~ Accesion + Bloque + (1|Plot), data = Lyc2,    
family=poisson(link="log"), weights = Lyc2$Var)

That fixes the variance issue:

Corrected variance residuals boxplot

  • Good for you for solving the problem. This isn't exactly the same as *estimating* the different variances for each group. https://stackoverflow.com/questions/21409340/how-to-allow-for-factor-specific-variance-of-random-effect-in-lme gives a link, but it's not terribly practical for your example since it requires setting up separate dummy variables for each group ... – Ben Bolker Aug 23 '17 at 00:28