3

I tried the following code related to lda and could not understand why it did not returns LD2 besides LD1.

library(MASS)
library(ggplot2)

load function ggplotLDAPrep() from here:

ggplotLDAPrep <- function(x){
  if (!is.null(Terms <- x$terms)) {
    data <- model.frame(x)
    X <- model.matrix(delete.response(Terms), data)
    g <- model.response(data)
    xint <- match("(Intercept)", colnames(X), nomatch = 0L)
    if (xint > 0L) 
      X <- X[, -xint, drop = FALSE]
  }
  means <- colMeans(x$means)
  X <- scale(X, center = means, scale = FALSE) %*% x$scaling
  rtrn <- as.data.frame(cbind(X,labels=as.character(g)))
  rtrn <- data.frame(X,labels=as.character(g))
  return(rtrn)
}


test<-iris[grep("setosa|virginica", iris$Species),1:5]
ldaobject <- lda(Species ~ ., data=test)
fitGraph <- ggplotLDAPrep(ldaobject)
ggplot(fitGraph, aes(LD1,LD2, color=labels))+geom_point()
ldaobject

Any insight ?

Community
  • 1
  • 1
user4178184
  • 89
  • 1
  • 8

1 Answers1

3

As mentioned by @user20650, you need at least 3 groups to return LD1 and LD2. See this example:

library(MASS)
library(ggplot2)

ggplotLDAPrep <- function(x){
  if (!is.null(Terms <- x$terms)) {
    data <- model.frame(x)
    X <- model.matrix(delete.response(Terms), data)
    g <- model.response(data)
    xint <- match("(Intercept)", colnames(X), nomatch = 0L)
    if (xint > 0L) 
      X <- X[, -xint, drop = FALSE]
  }
  means <- colMeans(x$means)
  X <- scale(X, center = means, scale = FALSE) %*% x$scaling
  rtrn <- as.data.frame(cbind(X,labels=as.character(g)))
  rtrn <- data.frame(X,labels=as.character(g))
  return(rtrn)
}


test<-iris[grep("setosa|virginica|versicolor", iris$Species),1:5]
ldaobject <- lda(Species ~ ., data=test)
fitGraph <- ggplotLDAPrep(ldaobject)
ggplot(fitGraph, aes(LD1,LD2, color=labels))+geom_point()

ggplot image

ldaobject

> ldaobject
Call:
lda(Species ~ ., data = test)

Prior probabilities of groups:
    setosa versicolor  virginica 
 0.3333333  0.3333333  0.3333333 

Group means:
           Sepal.Length Sepal.Width Petal.Length Petal.Width
setosa            5.006       3.428        1.462       0.246
versicolor        5.936       2.770        4.260       1.326
virginica         6.588       2.974        5.552       2.026

Coefficients of linear discriminants:
                    LD1         LD2
Sepal.Length  0.8293776  0.02410215
Sepal.Width   1.5344731  2.16452123
Petal.Length -2.2012117 -0.93192121
Petal.Width  -2.8104603  2.83918785

Proportion of trace:
   LD1    LD2 
0.9912 0.0088

Plot the results

ggplot(fitGraph, aes(LD1,LD2, color=labels))+

EDIT: add ellipse

This code is mainly from here

    geom_point() + 
    stat_ellipse(aes(x=LD1, y=LD2, fill = labels), alpha = 0.2, geom = "polygon")

enter image description here

Community
  • 1
  • 1
loki
  • 9,816
  • 7
  • 56
  • 82
  • Very good example. Thank loki and also @20650. Another question, how do we make different symbols for each class and add ellipse (if not difficult to do). – user4178184 Mar 03 '16 at 01:19