Reordering factor levels results in different anova p-value

Question

I have a really bizzare problem. I am running an anova on an lme mixed effects model that has "city" as one of the factors. There are three cities total for this vairable, and the levels are organized alphabetically by defualt. However, if I reorder the cities by latitude (thus changing the order) using the df$v1 <- factor(df$v1, levels=c(B, A, C) command, I get a totally different p-value for my anova results. My lme model is: mod <- lme(v3~v2*v1, random=~1|v4, data=df). For my anova, my code is: anova(mod, type = 'marginal')

str(df)
'data.frame':   5157 obs. of  6 variables:
 $ family    : Factor w/ 296 levels "A_101","A_102",..: 1 1 1 1 1 1 1 1 1 1 
...
 $ individual: Factor w/ 50 levels "1","10","1001",..: 1 17 21 32 43 46 47 48 
49 2 ...
$ city      : Factor w/ 3 levels "Miami","Tallahassee",..: 3 3 3 3 3 3 3 3 3 
3 ...
$ habitat   : Factor w/ 2 levels "Swamp","Beach": 2 2 2 2 2 2 2 2 2 2 ...
$ temp      : int  21 21 21 21 21 21 21 21 21 21 ...
$ shell_size        : num  0.673 0.657 0.658 0.695 0.67 0.668 0.683 0.681 
0.673 0.648 ...

head(df)
family individual  city        habitat temp shell_size
A_101       1      Miami       Swamp   21   0.673     
A_102       2      Miami       Swamp   23   0.657      
A_103       3      Tallahassee Beach   31   0.658        
A_104       4      Key Largo   Beach   33   0.695     
A_105       5      Tallahassee Swamp   26   0.670       
A_106       6      Key Largo   Swamp   31   0.668

How can changing the order of the cities possibly change the p-value? It shouldn't! I did an lsmeans with my city variable organized both by default (alphabetical) and by latitude, and the two test results were identical.

Welcome to Stack Overflow. Please [format your code appropriately](https://meta.stackexchange.com/a/22189/371738). In addition [provide example data](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610#5963610) in order to make your issue reproducible. — jay.sf, Jul 03 '18 at 18:56
Thanks! I just made an edit where I formatted my code properly, and I added an example of my data. I appreciate your help. — Richard Gourderton, Jul 04 '18 at 00:05
Without knowing the package the `lme()` function is in, I'm assuming the analysis requires a hold out level and that the results are relative to the hold out level. Thus changing the ordering of the factor level changes the hold out level and the results you are obtaining. — MHammer, Jul 04 '18 at 05:45
As @MHammer said, dummy coding is used as a default for factors in R such that "Miami" is the reference category which is compared to "Tallahassee" as well as "Key Largo". Changing the factor changes the reference category changes the comparisons being made changes the results. See, for example, https://stats.stackexchange.com/a/233615/27276 — hplieninger, Jul 04 '18 at 07:27
Thank for the comments MHammer and hplieninger. It just seems completely crazy that I get different p-values based on the factor order of the cities. So if I wanted to be dishonest, I could arrange my cities to purposely get a significant p-value? Or, if I had never fooled around with the order of the cities, I might have just reported the defualt p-value which is significant, even though the reordered city values are not significant? This seems like a major blunder and oversight in how anova or lme works. How much published work is garbage and p-hacking because of this flaw? — Richard Gourderton, Jul 04 '18 at 15:53
I wouldn't call it a major blunder or oversight or flaw. Many (all?) analyses require your estimation matrix to be invertible, and if you include an intercept term, you have to have a hold-out level for each variable. This is something that is well known in statistics. — MHammer, Jul 05 '18 at 05:17
I agree with @MHammer. The problem is not poor statistics but poor understanding/interpretation. With categorical variables, you cannot estimate just one parameter as you would with a continuous variable. For example, if you have 3 groups, you can make 3 pairwise comparisons, but 1 is redundant. The preferred way to address this is to make 2 comparisons *derived from your hypotheses* and R defaults to 2 comparisons based on dummy coding. And sure, different tests <-> different hypotheses, -> different results. — hplieninger, Jul 05 '18 at 08:50
An alternative way would be to look at the F-test of "city" (which is "blind" to the individual comparisons) and, if significant, *explore* all pairwise comparisons with [correction](https://en.wikipedia.org/wiki/Multiple_comparisons_problem) (e.g., `pairwise.t.test()` or package [emmeans](https://cran.r-project.org/package=emmeans)). — hplieninger, Jul 05 '18 at 08:51

Reordering factor levels results in different anova p-value

0 Answers0