1

I have trouble understanding how to 'export' equations of the delimitation lines resulting from a linear (LDA) or quadratic discriminant analysis (QDA) in R.

Ideally, I would like to compare both delineations (linear + quadratic curves) on the same graph. Here is an example of output (taken from http://adjchen.com/wiki/classification), with 2 groups. But it should work the same with multiples groups :

enter image description here

I have looked into this partial solution on stackoverflow, which I believe is a bit too sophisticated to draw a simple linear delimitation ?

How to plot classification borders on an Linear Discrimination Analysis plot in R

Here is an adaptation of the data set they use to play with in that example:

library(MASS)

# generate data
set.seed(123)
Ng <- 100 # number of cases per group
group.a.x <- rnorm(n = Ng, mean = 2, sd = 3)
group.a.y <- rnorm(n = Ng, mean = 2, sd = 3)

group.b.x <- rnorm(n = Ng, mean = 11, sd = 3)
group.b.y <- rnorm(n = Ng, mean = 11, sd = 3)

group.a <- data.frame(x = group.a.x, y = group.a.y, group = "A")
group.b <- data.frame(x = group.b.x, y = group.b.y, group = "B")

my.xy <- rbind(group.a, group.b)

# construct models
mdlLDA <- lda(group ~ x + y, data = my.xy)

mdlQDA <- qda(group ~ x + y, data = my.xy)

The closest alternative I came with is the partimat()function from klaRpackage, which however can be hard to modify or customize. Is there any other option using base R plot and equations of the delimitation line / curve ? or a ggplot function maybe ? Thanks in advance for any thoughts on this :) Best.

Sandipan Dey
  • 21,482
  • 2
  • 51
  • 63
Alex
  • 158
  • 5

1 Answers1

0

Here is how you can plot the LDA and QDA decision boundaries on the same plot with ggplot2. Note that the points with darker colors are trainign data points, purple line is the linear decision boundary with LDA and the green curve is the nonlinear decision boundary with QDA.

library(MASS)

# generate data with nonlinear decision boundary
set.seed(123)
Ng <- 500 # number of cases per group

my.xy <- data.frame(x = rnorm(Ng, 2, 3), y = rnorm(Ng, 2, 3))

my.xy$group = 'A'
my.xy[my.xy$x^2/36 + my.xy$y^2/64 < 1,]$group = 'B'

# construct models
mdlLDA <- lda(group ~ x + y, data = my.xy)
mdlQDA <- qda(group ~ x + y, data = my.xy)


#create test data
np = 50
x = seq(from = min(my.xy$x), to = max(my.xy$x), length.out = np)
y = seq(from = min(my.xy$y), to = max(my.xy$y), length.out = np)
df <- expand.grid(x = x, y = y)

df$classL <- as.numeric(predict(mdlLDA, newdata = df)$class)
df$classQ <- as.numeric(predict(mdlQDA, newdata = df)$class)

df$classLf <- ifelse(df$classL==1, 'A', 'B') 
df$classQf <- ifelse(df$classQ==1, 'A', 'B') 

library(tidyverse)
my.xy %>% ggplot() + geom_point(aes(x,y,col=group)) + 
       geom_point(data=df, aes(x,y,col=classQf), alpha=0.1) + 
       geom_contour(data=df, aes(x,y,z=classQ), col='green') + 
       geom_contour(data=df, aes(x,y,z=classL), col='purple', lty=4) + 
       scale_colour_manual(values=c("blue", "red")) + 
  theme_bw() 

enter image description here

Sandipan Dey
  • 21,482
  • 2
  • 51
  • 63