0

I saw this neat principal component analysis graph online, where they had lines connecting each cluster to a center point.

enter image description here

I used an example data set to show that I have made it up to adding the ellipses, but after looking online, I think this PCA package currently doesnt have the ability to add these, and in some cases, it is called as a "star". Is there a way to somehow loophole around and add this into a PCA chart?

I have added some sample code below that gets up the the part that doesn't have the lines connecting. Suggestions on this would be great please. My last thought is maybe using ggforce or something along those lines?

library(factoextra)
data(iris)

res.pca <- prcomp(iris[,-5], scale=TRUE)

fviz_pca_ind(res.pca, label="none", alpha.ind=1, pointshape=19,habillage=iris$Species, addEllipses = TRUE, ellipse.level=0.95)

Some comments have suggested these sites, but while it is close, it is a bit different since I am trying to use a data frame with one of the columns being that of the different categories I hope to use for the different clusterings.

link 1

link 2

Any possible suggestions would be much appreciated please.

neilfws
  • 32,751
  • 5
  • 50
  • 63
dkcongo
  • 227
  • 1
  • 9

2 Answers2

1

A quick and dirty hack is to create an edges df out of the ggplot data inside the output from fviz_pca_ind(), and then plot it with geom_segment().

Note that this might be visually sub-optimal because you often need the edges to be drawn before the nodes in order to highlight (i.e. not hide) the position of the latter. But barring a rewrite of df_raw_pca_viz and the fviz plotting functions, this is a a quick way to get what you asked.

Try:

library(factoextra)
library(purrr)
library(dplyr)
data(iris)

res.pca <- prcomp(iris[,-5], scale=TRUE)

g1 <- fviz_pca_ind(res.pca, label="none", alpha.ind=1, pointshape=19,habillage=iris$Species, addEllipses = TRUE, ellipse.level=0.95)

df_edges <- 
  pluck(g1, "data") |> as_tibble() |>
  group_by(Groups) %>% 
  summarise(xend = mean(x), yend = mean(y)) |>
  left_join(y =  pluck(g1, "data"), 
            by = "Groups", 
            multiple = "all")

g1 +
  geom_segment(data = df_edges, aes(xend = xend, yend = yend, x = x, y = y, colour = Groups), alpha = 0.25)

enter image description here

Nicolás Velasquez
  • 5,623
  • 11
  • 22
  • thanks for this idea! I like the loophole on this. I tried running this but I got an error stating "Error in is.data.frame(y) : object 'df_raw_pca_viz' not found" after using the code you provided. Does df_raw_pca_viz need to be defined? – dkcongo Jul 18 '23 at 03:26
  • 1
    Ah! I updated the answer to correct for that mistake. Please try again. – Nicolás Velasquez Jul 18 '23 at 04:34
  • amazing! Thank you so much! This helps understanding the workarounds on this! – dkcongo Jul 18 '23 at 04:48
0

Recently I developped a user friendly R package named "GABB", to perform simple and nice PCA, including segment from data point to barycenter of identified groups. Check the following example with mtcars data set and let me know if you :

library(GABB)

## Example of GABB package pipeline with the base data.set "mtcars" 
my.data <- mtcars

## Data preparation for RDA and PCA : tranformation and scaling of numeric/quantitative variables

prep_data(data = my.data, quantitative_columns = c(1:7), transform_data_method = "log", scale_data = T)

## Create PCA
library(FactoMineR)
my.pca <- FactoMineR::PCA(X = data_quant) 


## Create, display and save graphic output of individual and variable PCA

#Basic output with minimum required parameters
PCA_RDA_graphics(complete.data.set = initial_data_with_quant_transformed, PCA.object = my.pca, factor.names = c("vs", "am", "gear", "carb"))

#Advanced outputs (image below)
PCA_RDA_graphics(complete.data.set = initial_data_with_quant_transformed, PCA.object = my.pca, 
                 factor.names = c("vs", "am", "gear", "carb"), Biplot.PCA = TRUE,col.arrow.var.PCA = "grey",
                 Barycenter = TRUE, Segments = TRUE, Ellipse.IC = TRUE,
                 Barycenter.Ellipse.Fac1 = "vs", Barycenter.Ellipse.Fac2 = "am",
                 factor.colors = "vs", factor.shapes = "am",
                 Barycenter.factor.col = "vs", Barycenter.factor.shape = "am")

enter image description here

Beeflight31
  • 227
  • 7