R: How to use ggbiplot with pcaRes object? plot PCA results of data with missing values

Question

I usually perform Principal Component Analyses with the prcomp function, and plot the results in a fancy way with ggbiplot (or otherwise just with ggplot2 extracting pca.obj$x).

Like this:

#install_github("vqv/ggbiplot")
library(ggbiplot)
data(iris)
pca.obj <- prcomp(iris[,1:4], center=TRUE, scale.=TRUE)
P <- ggbiplot(pca.obj,
         obs.scale = 1, 
         var.scale=1,
         ellipse=T,
         circle=F,
         varname.size=3,
         var.axes=T,
         groups=iris$Species, #no need for coloring, I'm making the points invisible
         alpha=0) #invisible points, I add them below
P$layers <- c(geom_point(aes(color=iris$Species), cex=5), P$layers) #add geom_point in a layer underneath (only way I have to change the size of the points in ggbiplot)
png(filename="test.png", height=600, width=600)
print(#or ggsave()
    P
)
dev.off()

HOWEVER, now I am facing data with some amount of NAs, and I am using the pca wrapper function from the pcaMethods package, applying the nipals method (an iterative method capable of handling small amounts of missing values).

pca returns an object of class pcaRes, and ggbiplot returns the following error:

#introduce NAs
iris$Sepal.Length[sample(1:150, 5)] <- NA
iris$Sepal.Width[sample(1:150, 5)] <- NA
iris$Petal.Length[sample(1:150, 5)] <- NA
iris$Petal.Width[sample(1:150, 5)] <- NA
#pca.obj2 <- prcomp(iris[,1:4], center=TRUE, scale.=TRUE) #cannot use prcomp with NAs
#source("https://bioconductor.org/biocLite.R")
#biocLite("pcaMethods")
library(pcaMethods)
pca.obj2 <- pca(iris[,1:4], method="nipals", nPcs=3, center=TRUE, scale.=TRUE)
class(pca.obj2)
ggbiplot(pca.obj2)

Error in ggbiplot(pca.obj2) : Expected a object of class prcomp, princomp, PCA, or lda

MY QUESTIONS are:

How can I apply ggbiplot to a pcaRes object?

How can I convert this object to a prcomp object?

Can I obtain the same kind of plot with another function instead of ggbiplot that accepts a pcaRes object?

Should I just replace the NA values with the mean of the variable and apply the prcomp function as usual?

Many thanks!

You need to use `str(pca.obj2)` and compare the result with the result of `prcomp`. Then try to make it compatible with `ggbiplot` — user101089, Apr 09 '18 at 02:58

score 1 · Accepted Answer · answered Apr 12 '18 at 04:53

First of all, good job finding a PCA package that will handle NAs.

Since ggbiplot will not accept pcaRes objects, we can use the data obtained by the pcaRes and sneak it into the original prcomp object.

Obviously your real data will already contain the NA values, so we'll start with that data set and swap them out for some dummy values to allow us to run the first prcomp pca.

iris_na<-iris

iris_na$Sepal.Length[sample(1:150, 5)] <- NA
iris_na$Sepal.Width[sample(1:150, 5)] <- NA
iris_na$Petal.Length[sample(1:150, 5)] <- NA
iris_na$Petal.Width[sample(1:150, 5)] <- NA

iris_dummy<-iris_na

iris_dummy[is.na(iris_dummy)]<-7777 #swap out your NAs with a dummy number so prcomp will run

We then run the first pca as you did:

pca.obj <- prcomp(iris_dummy[,1:4], center=TRUE, scale.=TRUE)

There are 5 components of this object, x (scores), rotation (loadings), sdev (standard deviation), center and scale. Although I suspect that only the scores and the loadings are used by ggbiplot, we'll swap them all out just to be sure.

Looking at the scores component pca.obj$x shows us that four principal components have been calculated in the prcomp function.

head(pca.obj$x)

#           PC1        PC2         PC3         PC4
#[1,] -2.656740  0.3176722  0.03763067 -0.04122948
#[2,] -2.688275 -0.1821744  0.19912795  0.07297624
#[3,] -2.862673 -0.1447518 -0.02134749 -0.02462359
#[4,] -2.718294 -0.3189371 -0.03318459 -0.11675762
#[5,] -2.700864  0.3274887 -0.07503096 -0.11347939
#[6,] -2.252918  0.7436711 -0.14611455 -0.08218007

So when we run the next pca with pcaRes, we make sure to specify that 4 principal components are calculated using the nPcs argument. Here we are using the true data, which contains the NAs.

pca.obj2 <- pca(iris_na[,1:4], method="nipals", nPcs=4, center=TRUE, scale.=TRUE)

It is then just a matter of swapping out the pcaRes values for the prcomp values and passing this to ggbiplot

pca.obj$x<-pca.obj2@scores 

pca.obj$rotation<-pca.obj2@loadings 

pca.obj$sdev<-pca.obj2@sDev

pca.obj$center<-pca.obj2@center

pca.obj$scale<-pca.obj2@scale

P2 <- ggbiplot(pca.obj,
              obs.scale = 1, 
              var.scale=1,
              ellipse=T,
              circle=F,
              varname.size=3,
              var.axes=T,
              groups=iris$Species, 
              alpha=0) 
P2$layers <- c(geom_point(aes(color=iris$Species), cex=5), P2$layers)

R: How to use ggbiplot with pcaRes object? plot PCA results of data with missing values

1 Answers1