16

I am working on finalizing a NMDS plot that I created in vegan and ggplot2 but cannot figure out how to add envfit species-loading vectors to the plot. When I try to it says "invalid graphics state".

The example below is slightly modified from another question (Plotting ordiellipse function from vegan package onto NMDS plot created in ggplot2) but it expressed exactly the example I wanted to include since I used this question to help me get metaMDS into ggplot2 in the first place:

library(vegan)
library(ggplot2)
data(dune)

# calculate distance for NMDS
NMDS.log<-log(dune+1)
sol <- metaMDS(NMDS.log)

# Create meta data for grouping
MyMeta = data.frame(
  sites = c(2,13,4,16,6,1,8,5,17,15,10,11,9,18,3,20,14,19,12,7),
  amt = c("hi", "hi", "hi", "md", "lo", "hi", "hi", "lo", "md", "md", "lo", 
      "lo", "hi", "lo", "hi", "md", "md", "lo", "hi", "lo"),
row.names = "sites")

# plot NMDS using basic plot function and color points by "amt" from MyMeta
plot(sol$points, col = MyMeta$amt)

# same in ggplot2
NMDS = data.frame(MDS1 = sol$points[,1], MDS2 = sol$points[,2])
ggplot(data = NMDS, aes(MDS1, MDS2)) + 
  geom_point(aes(data = MyMeta, color = MyMeta$amt))

#Add species loadings
vec.sp<-envfit(sol$points, NMDS.log, perm=1000)
plot(vec.sp, p.max=0.1, col="blue")
Community
  • 1
  • 1
HFBrowning
  • 2,196
  • 3
  • 23
  • 42
  • 1
    +1 Excellent question. Do study my Answer to see how one should go about extracting scores from objects or plotting them. You shouldn't, in most instances with **vegan** need to do things like `sol$points` as we provide `scores()` methods for this as well as a `plot()` method. – Gavin Simpson May 10 '13 at 15:53

4 Answers4

14

The problem with the (otherwise excellent) accepted answer, and which explains why the vectors are all of the same length in the included figure [Note that the accepted Answer has now been edited to scale the arrows in the manner I describe below, to avoid confusion for users coming across the Q&A], is that what is stored in the $vectors$arrows component of the object returned by envfit() are the direction cosines of the fitted vectors. These are all of unit length, and hence the arrows in @Didzis Elferts' plot are all the same length. This is different to the output from plot(envfit(sol, NMDS.log)), and arises because we scale the vector arrow coordinates by the correlation with the ordination configuration ("axes"). That way, species that show a weak relationship with the ordination configuration get shorter arrows. The scaling is done by multiplying the direction cosines by sqrt(r2) where r2 are the values shown in the table of printed output. When adding the vectors to an existing plot, vegan also tries to scale the set of vectors such that they fill the available plot space whilst maintaining the relative lengths of the arrows. How this is done is discussed in the Details section of ?envfit and requires the use of the un-exported function vegan:::ordiArrowMul(result_of_envfit).

Here is a full working example that replicates the behaviour of plot.envfit using ggplot2:

library(vegan)
library(ggplot2)
library(grid)
data(dune)

# calculate distance for NMDS
NMDS.log<-log1p(dune)
set.seed(42)
sol <- metaMDS(NMDS.log)

scrs <- as.data.frame(scores(sol, display = "sites"))
scrs <- cbind(scrs, Group = c("hi","hi","hi","md","lo","hi","hi","lo","md","md",
                              "lo","lo","hi","lo","hi","md","md","lo","hi","lo"))

set.seed(123)
vf <- envfit(sol, NMDS.log, perm = 999)

If we stop at this point and look at vf:

> vf

***VECTORS

             NMDS1       NMDS2     r2 Pr(>r)    
Belper -0.78061195 -0.62501598 0.1942  0.174    
Empnig -0.01315693  0.99991344 0.2501  0.054 .  
Junbuf  0.22941001 -0.97332987 0.1397  0.293    
Junart  0.99999981 -0.00062172 0.3647  0.022 *  
Airpra -0.20995196  0.97771170 0.5376  0.002 ** 
Elepal  0.98959723  0.14386566 0.6634  0.001 ***
Rumace -0.87985767 -0.47523728 0.0948  0.429
.... <truncated>

So the r2 data is used to scale the values in columns NMDS1 and NMDS2. The final plot is produced with:

spp.scrs <- as.data.frame(scores(vf, display = "vectors"))
spp.scrs <- cbind(spp.scrs, Species = rownames(spp.scrs))

p <- ggplot(scrs) +
  geom_point(mapping = aes(x = NMDS1, y = NMDS2, colour = Group)) +
  coord_fixed() + ## need aspect ratio of 1!
  geom_segment(data = spp.scrs,
               aes(x = 0, xend = NMDS1, y = 0, yend = NMDS2),
               arrow = arrow(length = unit(0.25, "cm")), colour = "grey") +
  geom_text(data = spp.scrs, aes(x = NMDS1, y = NMDS2, label = Species),
            size = 3)

This produces:

enter image description here

Gavin Simpson
  • 170,508
  • 25
  • 396
  • 453
  • As long as coord_fixed() is kept at ratio=1, will the point locations in a ggplot always match the (correct) point locations of a Vegan ordiplot? – Vinterwoo Dec 17 '13 at 07:12
  • 2
    @Vinterwoo With or without the `coord_fixed` & `ratio = 1` the points will match the correct point locations in the `plot` method for `envfit`. The `coord_fixed` et al parts are required to maintain the metric properties of the ordination solutions; both axes should be scaled in the same "units". – Gavin Simpson Dec 17 '13 at 13:35
  • Thanks for that Gavin. In regards to using envfit vectors to overlay information about how species effect site ordination location - It was my (most likely flawed) understanding that this kind of info could not be pulled out of a NMDS analysis. And that it was impossible to get info on the variance explained by ordination plots. It seems the significance codes from envfit, and the location of the arrows in spp.scores seem to give this kind of info. My apologies if this comment address the more global aspects of the original question, and not the specifics of your excellent answer. – Vinterwoo Dec 18 '13 at 05:39
  • 1
    Your understanding is only relevant if the NMDS doesn't have access to the species data. NMDS itself doesn't use this info; that is embedded in the dissimilarity matrix (Dij) and NMDS uses that. However, we can project vectors or points into the NMDS solution using ideas familiar from other methods. `metaMDS`'s `plot` method can add species points as weighted averages of the NMDS site scores if you fit the model using the raw data not the Dij. `envfit` uses the well-established method of vector fitting, *post hoc*. – Gavin Simpson Dec 18 '13 at 14:12
  • 1
    @Vinterwoo The main thing to note though is that these so-derived scores (whichever way you do it) are somewhat removed from the "species" info embedded in the NMDS solution. That itself depends on the dissimilarity coefficient used. – Gavin Simpson Dec 18 '13 at 14:13
  • @Vinterwoo You might also be interested in this related [question](http://stackoverflow.com/a/16735864/429846) – Gavin Simpson Dec 18 '13 at 19:19
  • When I do this, my plot's scale is changed. It seems to be because `Species` are wanting to push the axis boundaries which just shrinks my actual plot. Any idea how to stop this? https://imgur.com/a/pIjnwBC – Lamma Jan 31 '22 at 11:51
  • @Lamma perhaps you need to multiply by the factor obtained with vegan:::ordiArrowMul(result_of_envfit). – Cliff Bueno Oct 25 '22 at 14:37
  • @GavinSimpson is there a reason why you did not multiply by the factor obtained by vegan:::ordiArrowMul(result_of_envfit) in your answer? This was mentioned but not implemented (as shown by the answer by crazysantaclaus)? Thanks. – Cliff Bueno Oct 25 '22 at 14:39
10

Start with adding libraries. Additionally library grid is necessary.

library(ggplot2)
library(vegan)
library(grid)
data(dune)

Do metaMDS analysis and save results in data frame.

NMDS.log<-log(dune+1)
sol <- metaMDS(NMDS.log)

NMDS = data.frame(MDS1 = sol$points[,1], MDS2 = sol$points[,2])

Add species loadings and save them as data frame. Directions of arrows cosines are stored in list vectors and matrix arrows. To get coordinates of the arrows those direction values should be multiplied by square root of r2 values that are stored in vectors$r. More straight forward way is to use function scores() as provided in answer of @Gavin Simpson. Then add new column containing species names.

vec.sp<-envfit(sol$points, NMDS.log, perm=1000)
vec.sp.df<-as.data.frame(vec.sp$vectors$arrows*sqrt(vec.sp$vectors$r))
vec.sp.df$species<-rownames(vec.sp.df)

Arrows are added with geom_segment() and species names with geom_text(). For both tasks data frame vec.sp.df is used.

ggplot(data = NMDS, aes(MDS1, MDS2)) + 
  geom_point(aes(data = MyMeta, color = MyMeta$amt))+
  geom_segment(data=vec.sp.df,aes(x=0,xend=MDS1,y=0,yend=MDS2),
      arrow = arrow(length = unit(0.5, "cm")),colour="grey",inherit_aes=FALSE) + 
  geom_text(data=vec.sp.df,aes(x=MDS1,y=MDS2,label=species),size=5)+
  coord_fixed()

enter image description here

Didzis Elferts
  • 95,661
  • 14
  • 264
  • 201
  • This worked great! With a little manipulation I was able to make it look really nice and strangely the loadings seem to be more accurate than the ones from envfit were. I think those may have shown the p-value through the length of the line but I would rather report that myself. Thank you! :) – HFBrowning Feb 05 '13 at 23:27
  • Hmm, something seems off here - the lengths of the vectors *shouldn't* be the same... They aren't in the output from `envfit`. And @HFBrowning the length of the vector simply indicates the strength of the correlation; it is not a direct indication of the *p* value. – Gavin Simpson May 10 '13 at 15:20
  • Ahh, and I see now - you shouldn't delve into objects and rip out components. Use the accessor functions - in this case `scores(vec.sp, display = "vectors")`. See my Answer for a fully worked through example. – Gavin Simpson May 10 '13 at 15:38
  • Yes, you are right, this isn't the way it should be. I'll delete my answer as soon as @HFBrowning will accept yours. – Didzis Elferts May 10 '13 at 15:47
  • @DidzisElferts Rather than delete it, perhaps just update it? I mean you did most of the hard work which I just copied for my Answer, only modifying a few bits for style and to extract the same data that the `plot.envfit` methods uses. – Gavin Simpson May 10 '13 at 15:50
  • @GavinSimpson Updated my answer - solution without function scores() :) – Didzis Elferts May 10 '13 at 16:03
4

May i add something late?

Envfit provides pvalues, and sometimes you want to just plot the significant parameters (something vegan can do for you with p.=0.05 in the plot command). I struggled to do that with ggplot2. Here is my solution, maybe you find a more elegant one?

Starting from Didzis' answer from above:

ef<-envfit(sol$points, NMDS.log, perm=1000)
ef.df<-as.data.frame(ef$vectors$arrows*sqrt(ef$vectors$r))
ef.df$species<-rownames(ef.df)

#only significant pvalues
#shortcutting ef$vectors
A <- as.list(ef$vectors)
#creating the dataframe
pvals<-as.data.frame(A$pvals)
arrows<-as.data.frame(A$arrows*sqrt(A$r))
C<-cbind(arrows, pvals)
#subset
Cred<-subset(C,pvals<0.05)
Cred <- cbind(Cred, Species = rownames(Cred))

"Cred "can now be implemented in the geom_segment-argument as discussed above.

nouse
  • 3,315
  • 2
  • 29
  • 56
2

Short addition: To get a full representation of the plot.envfit functionality within ggplot2 aka "arrow lengths make full use of plot area" a factor needs to be applied. I don't know if it was intentionally left out in the answers above, as it was even specifically mentioned by Gavin? Just extract the required scaling factor using arrow_factor <- ordiArrowMul(vf) and then you can either apply it to both NMDS columns in spp.scrs or you can do this manually like

arrow_factor <- ordiArrowMul(vf)
spp.scrs <- as.data.frame(scores(vf, display = "vectors")) * arrow_factor
spp.scrs <- cbind(spp.scrs, Species = rownames(spp.scrs), Pvalues = vf$vectors$pvals, R_squared = vf$vectors$r)

# select significance similarly to `plot(vf, p.max = 0.01)`
spp.scrs <- subset(spp.scrs, Pvalues < 0.01)

# you can also add the arrow factor in here (don't do both!)
ggplot(scrs) +
  geom_point(mapping = aes(x = NMDS1, y = NMDS2, colour = Group)) +
  coord_fixed() + ## need aspect ratio of 1!
  geom_segment(data = spp.scrs,
               aes(x = 0, xend = NMDS1 * arrow_factor, y = 0, yend = NMDS2 * arrow_factor),
               arrow = arrow(length = unit(0.25, "cm")), colour = "grey") +
  geom_text(data = spp.scrs, aes(x = NMDS1 * arrow_factor, y = NMDS2 * arrow_factor, label = Species),
            size = 3)
crazysantaclaus
  • 613
  • 5
  • 19