0

I am currently working on a large dataset (count data with species x samples) from which I performed a PCA. What I get is a massive cloud of points, and I would like to color one given species to show where it is located in this cloud (species are my variables here). Here is what it looks like :

enter image description here

I use the package factoextra, and visualize the variables with fviz_pca_var. Is there a way to select one particular species and display it with a color different than the others ?

Thank you for your help

stefan
  • 90,330
  • 6
  • 25
  • 51
Droidux
  • 146
  • 2
  • 12
  • There is in vanilla ggplot2, but I have no clue about these other packages you mention. – teunbrand Jul 08 '22 at 11:41
  • It's easier to help you if you provide a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. – MrFlick Jul 08 '22 at 12:53

2 Answers2

0

I would not label every data point. Just use a legend and highlight your species with, eg. red color, and all other species green.

You did not provide example data, so I give you a solution with other sample data. See the code below. Using factoextra (and factominer) make your pca for all numerical columns. Then add a factor variable as a highlighter of your species when plotting the 2 dimensions of the PCA. Just make a new factor var with a simple ifelse column to separate your species from the rest. Use this factor column for highlighting in the fviz_pca_ind plot. See code below for an example:

library(FactoMineR)
library(ggplot2)
library(factoextra)

data("iris")
iris2 <- iris[1:4]
head(iris2)

# PCA analysis to get PCs
iris.pca <- PCA(iris2, scale.unit = TRUE, graph = FALSE)

# use Species from iris to change habillage
fviz_pca_ind(iris.pca, label="none", habillage = iris$Species)


library("FactoMineR")
res.pca <- PCA(df,  graph = FALSE)

iris$new_species <- as.factor(ifelse(iris$Species == "virginica", 
"my_species", "other_species"))

# Only highlight one species - rest black
fviz_pca_ind(iris.pca, label="none", habillage = 
iris$new_species)

enter image description here

NeuroNaut
  • 68
  • 4
0

If it's just a single point you want to color, perhaps:

library(tidyverse)
library(factoextra)
library(FactoMineR)

data("iris")

iris$assigned_colors <- NA
# Change the color of the 'individual of interest'
iris[9,]$assigned_colors <- "red"

iris.pca <- PCA(iris[,-c(5,6)], graph = FALSE)

fviz_pca_ind(iris.pca,
             geom = "point",
             geom.ind = "point") +
  geom_point(aes(color = iris$assigned_colors)) +
  scale_color_identity()
#> Warning: Removed 149 rows containing missing values (geom_point).

Created on 2022-07-08 by the reprex package (v2.0.1)

You can also label specific points (i.e. just the point of interest) using this approach, e.g.

library(tidyverse)
library(factoextra)
library(FactoMineR)

data("iris")

iris$assigned_colors <- NA
iris[9,]$assigned_colors <- "red"

iris$labels <- NA
iris[9,]$labels <- "point of interest"

iris.pca <- PCA(iris[,-c(5,6, 7)], graph = FALSE)

fviz_pca_ind(iris.pca,
             geom = "point",
             geom.ind = "point") +
  geom_point(aes(color = iris$assigned_colors)) +
  geom_text(aes(label = iris$labels), nudge_y = -0.2) +
  scale_color_identity()
#> Warning: Removed 149 rows containing missing values (geom_point).
#> Warning: Removed 149 rows containing missing values (geom_text).

Created on 2022-07-08 by the reprex package (v2.0.1)

jared_mamrot
  • 22,354
  • 4
  • 21
  • 46