3

I am very very new to R and stats in general, and am having trouble adding multiple confidence ellipses to a PCA plot.

My interest is in highlighting potential groupings/clusters in the PCA plot with 95% confidence ellipses. I have tried using the dataEllipse function in R, however I cannot figure out how to add multiple ellipses with different centers to the PCA plot (the centers would be at various points that appear to contain a cluster, in this case lithic sources and lithic tools likely made from that source).

Thanks for any help with this! {

lithic_final <- LITHIC.DATASHEET.FOR.R.COMPLETE.FORMAT
lithic_final

pca1 <- princomp(lithic_final); pca1

lithic_source <- c("A1", "A1", "A1", "A1", "A2","A2", "A2", "A3","A3","A3","B","B","B","B","B","B","C","C","C","C","C","C","C","D","D","D","D","D","D","D","D","E","E","E","E","E","E","E","E","F","F","G","G","G","G","H","H","H","H","H","H","H","I1","I1","I1","I2","I2","I2","I2","I2","J1","J1","J2","J2","J2","J2","J2","J2","J2","J2","J2","K","K","K","K","K","K","K","L","L","L","L","L","L","L","L","L","L","L","L","L","L","BB1","BB1","BB1","FC","FC","FC","JRPP","JRPP","JRPP","BB2","BB2","BB2","BB2","MWP","MWP","MWP","MWP","RPO","RPO","RPO")

lithic_source

summary(pca1)

plot(pca1)

#Plotting the scores with the Lithic Source Info
round(pca1$scores[,1:2], 2)
pca_scores <-round(pca1$scores[,1:2], 2)
plot(pca1$scores[,1], pca1$scores[,2], type="n")
text(pca1$scores[,1], pca1$scores[,2],labels=abbreviate(lithic_source, minlength=3), cex=.45)



#Plotting PCA Scores of EACH SAMPLE for PCA 2 and 3 with Lithic Source Info
round(pca1$scores[,2:3], 2)
pca2_3_scores <-round(pca1$scores[,2:3], 2)
plot(pca1$scores[,2], pca1$scores[,3], type="n")
text(pca1$scores[,2], pca1$scores[,3], labels=abbreviate(lithic_source, minlength=3), cex=.45)

#Plotting PCA Scores of EACH SAMPLE for PCA 3 and 4 with Lithic Source Info
round(pca1$scores[,3:4], 2)
pca3_4_scores <-round(pca1$scores[,3:4], 2)
plot(pca1$scores[,3], pca1$scores[,4], type="n")
text(pca1$scores[,3], pca1$scores[,4], labels=abbreviate(lithic_source, minlength=3), cex=.45)

#Plotting PCA Scores of EACH SAMPLE for PCA 1 and 3 with Lithic Source Info
round(pca1$scores[,1:3], 2)
pca1_3_scores <-round(pca1$scores[,1:3], 2)
plot(pca1$scores[,1], pca1$scores[,3], type="n")
text(pca1$scores[,1], pca1$scores[,3], labels=abbreviate(lithic_source, minlength=3), cex=.45)

#Plotting PCA Scores of EACH SAMPLE for PCA 1 and 4 with Lithic Source Info
round(pca1$scores[,1:4], 2)
pca1_4_scores <-round(pca1$scores[,1:4], 2)
plot(pca1$scores[,1], pca1$scores[,4], type="n")
text(pca1$scores[,1], pca1$scores[,4], labels=abbreviate(lithic_source, minlength=3), cex=.45)

#TRYING TO GET ELLIPSES ADDED TO PCA 1 and 4 scores
dataEllipse(pca1$scores[,1], pca1$scores[,4],centers=12,add=TRUE,levels=0.9, plot.points=FALSE)


structure(list(Ca.K12 = c(418L, 392L, 341L, 251L, 297L, 238L, 
258L, 5L, 2L, 37L), Cr.K12 = c(1L, 12L, 15L, 6L, 9L, 6L, 35L, 
7L, 45L, 32L), Cu.K12 = c(89L, 96L, 81L, 63L, 88L, 103L, 104L, 
118L, 121L, 90L), Fe.K12 = c(18627L, 18849L, 18413L, 12893L, 
17757L, 17270L, 16198L, 2750L, 4026L, 3373L), K.K12 = c(20L, 
23L, 28L, 0L, 34L, 17L, 45L, 102L, 150L, 147L), Mn.K12 = c(205L, 
212L, 235L, 120L, 216L, 212L, 246L, 121L, 155L, 115L), Nb.K12 = c(139L, 
119L, 154L, 91L, 122L, 137L, 137L, 428L, 414L, 428L), Rb.K12 = c(99L, 
42L, 79L, 49L, 210L, 243L, 168L, 689L, 767L, 705L), Sr.K12 = c(3509L, 
3766L, 3481L, 2715L, 2851L, 2668L, 2695L, 202L, 220L, 217L), 
    Ti.K12 = c(444L, 520L, 431L, 293L, 542L, 622L, 531L, 82L, 
    129L, 84L), Y.K12 = c(135L, 121L, 105L, 74L, 144L, 79L, 85L, 
    301L, 326L, 379L), Zn.K12 = c(131L, 133L, 108L, 78L, 124L, 
    111L, 114L, 81L, 78L, 59L), Zr.K12 = c(1348L, 1479L, 1333L, 
    964L, 1506L, 1257L, 1296L, 3967L, 4697L, 4427L)), .Names = c("Ca.K12", 
"Cr.K12", "Cu.K12", "Fe.K12", "K.K12", "Mn.K12", "Nb.K12", "Rb.K12", 
"Sr.K12", "Ti.K12", "Y.K12", "Zn.K12", "Zr.K12"), row.names = c(NA, 
10L), class = "data.frame")
guido
  • 31
  • 1
  • 3
  • 1
    please post some code that minimally reproduces what you have so far so people have something to go on – tim riffe Aug 22 '12 at 17:28
  • [Like so](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) – ROLO Aug 22 '12 at 17:56
  • Ok, added some code. Sorry about that, new to all of this and apologies for any further mistakes and the lack of elegance in my code. The last line is the closest I can get to adding any kind of ellipse to my dataset. – guido Aug 22 '12 at 18:17
  • No problem, welcome on SO btw. Can you use `dput(lithic_final)` to print (a subset of) the data in a format that is easily imported and post it here (or post it on pastebin or so)? You also might want to look into [this question](http://stats.stackexchange.com/questions/24450/how-to-highlight-predefined-groups-in-pca-individual-map) and the [package](http://factominer.free.fr/) mentioned there. – ROLO Aug 22 '12 at 18:40
  • Ok, will do. Thanks for the links. I am posting the output from (only the first 10 rows) to the post above. Is this enough to go on or did I miss a step? Thanks for your patience and help! – guido Aug 22 '12 at 19:06
  • Quick update: I checked out the FactoMineR package and am definitely liking it better for PCA than the stock R one. However, I run into the same issue as this [user](http://stats.stackexchange.com/questions/24450/how-to-highlight-predefined-groups-in-pca-individual-map) in that I get the `Error in if (scale[1] > 0) r <- r/scale[1] : missing value where TRUE/FALSE needed` when I try to run the coord.ellipse function. Ive tried all I can with it, but no luck. Sigh. Anyway, thanks for any help with this! – guido Aug 22 '12 at 21:09

2 Answers2

6

I think you would have received a speedier reply if you had focused on your question instead of all the extraneous stuff. You gave us your commands for plotting a bunch of principal components that had nothing to do with your question. The question is, how do you plot ellipses by group? Your sample data at 10 lines and three groups is not helpful because 3 points is not enough to plot data ellipses. You are using the dataEllipse function in package car which has the simplest answer to your question:

First, a reproducible example:

set.seed(42) # so you can get the same numbers I get
source_a <- data.frame(X1=rnorm(25, 50, 5), X2=rnorm(25, 40, 5))
source_b <- data.frame(X1=rnorm(25, 20, 5), X2=rnorm(25, 40, 5))
source_c <- data.frame(X1=rnorm(25, 35, 5), X2=rnorm(25, 25, 5))
lithic_dat <- rbind(source_a, source_b, source_c)
lithic_source <- c(rep("a", 25), rep("b", 25), rep("c", 25))

Plot ellipses with scatterplot() and add text:

scatterplot(X2~X1 | lithic_source, data=lithic_dat, pch="", smooth=FALSE, 
     reg.line=FALSE, ellipse=TRUE, levels=.9)
text(lithic_dat$X1, lithic_dat$X2, lithic_source, cex=.75)

Scatterplot can be tweaked to do everything you want, but it is also possible to plot the ellipses without using it:

sources <- unique(lithic_source) # vector of the different sources
plot(lithic_dat$X1, lithic_dat$X1, type="n")
text(lithic_dat$X1, lithic_dat$X2, lithic_source, cex=.75)
for (i in sources) with(lithic_dat, dataEllipse(X1[lithic_source==i],
     X2[lithic_source==i], levels=.9, plot.points=FALSE))

This will work for your principal components and any other data.

dcarlson
  • 10,936
  • 2
  • 15
  • 18
3

Here is a simple solution using a package called ggbiplot (available on github) with Iris data. I hope this is what you were looking for.

library(devtools);install_github('vqv/ggbiplot')
library(ggbiplot)
pca = prcomp(iris[,1:4]) 
ggbiplot(pca,groups = iris$Species,ellipse = T,ellipse.prob = .95)