My variables are measured on a randomized block with a subsampling design where my treatments are the 23 Accesion. I have 3 complete blocks and 6 samples per block. The example dataframe has 4 response variables (LH, REN, FTT, DFR), Accesion (treatment), Bloque (Block number) and Plot (that is the variable accounting for the subsampling). The head of the data is:
Plot Accesion Bloque LH REN FTT DFR
1 221 22 1 20.6 1127 23 88
2 221 22 1 20.5 1638 20 88
3 221 22 1 24.5 1319 16 88
4 221 22 1 21.4 960 17 88
5 221 22 1 25.7 1469 18 88
6 221 22 1 25.8 1658 21 88
So, the data is non-normal and heteroscedastic for almost all of the 100 response variables after all types of transformations (log, boxcox, power, etc). Most of the variables show a chi-squared or Poisson-like distribution with different variance for each Accesion.
So far I've beeng working on running a generalized linear model-effect with Poisson using the glmer()
on the FTT as a response variable. I am using this code:
FTTglme = glmer(FTT ~ Accesion + Bloque + (1|Plot), data = Lyc,
family=poisson(link="identity"))
The residuals are non-normal according to the shapiro.test(). That, I assume, is because of the heteroscedasticity observed in the residuals. As a boxplot of the residuals by Accesion, shows the difference of variances:
The heteroscedasticity is expected between plant populations, but I know it can be modelled inside the glme. The code that I should add, as I have investigated already, is:
vf <- varIdent(form=~Accesion)
FTTglme = glmer(FTT ~ Accesion + Bloque + (1|Plot), data = Lyc,
family=poisson(link="identity"), weights = vf)
I want the different variances to account for each Accesion category. But I keep getting the error:
Error in model.frame.default(data = Lyc, weights = varIdent(form = ~Accesion), : variable lengths differ (found for '(weights)')
Does anyone know how to account for the differences of variance between the Accesions inside the glmer()
?
Any other suggestions to analyze the data is also welcome.