2

I have three data frames and I want to perform a Principal Component Analysis (PCA) in R. I merged the data frames with rbind() and did a PCA with that. That worked. But I want to discriminate the dots according to the data frame they belong to. With the merged data frame, that is impossible (or isn´t it?). When I use PCA(X=c(df1,df2,df3) it is complaining about differing number of rows (which is obviously actually the case).

pca <- PCA(X=c(df1,df2,df3))
fviz_pca_ind(pca,
             geom.ind = "point", # show points only (nbut not "text")
             col.ind = c(df1,df2,df3), # color by groups
             palette = c("#00AFBB", "#E7B800", "#FC4E07"),
             addEllipses = TRUE, # Concentration ellipses
             legend.title = "Groups"
             )

That is not working...

How can I perform a PCA with variables of three different data frames and color discriminate them? I have no reprex because it is difficult to provide in that case.

Thank you all for your suggestions ;)

takeITeasy
  • 350
  • 3
  • 19
  • 1
    Can you make a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example)? It could be toy data or small samples. Any reason why you can't row-bind the data frames, adding a variable to show which dataset they came from, and do PCA on that? – camille Feb 12 '20 at 14:48

1 Answers1

3

You need to collect the length of your data frames, one way is shown below, where I collect 3 dataframes in a list:

library(FactoMineR)
library(factoextra)

df1 = subset(iris,Species=="setosa")[,-5]
df2 = subset(iris,Species=="versicolor")[,-5]
df3 = subset(iris,Species=="virginica")[,-5]

X = list(df1=df1,df2=df2,df3=df3)

you combine them using do.call(rbind..) and the labels are repeating the names of the data frame, by its number of rows:

labels = rep(names(X),sapply(X,nrow))
table(labels)

Then you plot, giving the col.ind as labels:

pca <- PCA(do.call(rbind,X))
fviz_pca_ind(pca,
             geom.ind = "point", # show points only (nbut not "text")
             col.ind = labels, # color by groups
             palette = c("#00AFBB", "#E7B800", "#FC4E07"),
             addEllipses = TRUE, # Concentration ellipses
             legend.title = "Groups"
)

enter image description here

StupidWolf
  • 45,075
  • 17
  • 40
  • 72