1

I've run a PCA with a moderately-sized data set, but I only want to visualize a certain amount of points from that analysis because they are from repeat observations and I want to see how close the paired observations are to each other on the plot. I've set it up so that the first 18 individuals are the ones I want to plot, but I can't seem to only plot just the first 18 points without only doing an analysis of only the first 18 instead of the whole data set (43 individuals).

# My data file
TrialsMR<-read.csv("NER_Trials_Matrix_Retrials.csv", row.names = 1)
# I ran the PCA of all of my values (without the categorical variable in col 8)
R.pca <- PCA(TrialsMR[,-8], graph = FALSE)
# When I try to plot only the first 18 individuals with this method, I get an error
fviz_pca_ind(R.pca[1:18,], 
             labelsize = 4, 
             pointsize = 1, 
             col.ind = TrialsMR$Bands, 
             palette = c("red", "blue", "black", "cyan", "magenta", "yellow", "gray", "green3", "pink" ))
# This is the error
Error in R.pca[1:18, ] : incorrect number of dimensions 

The 18 individuals are each paired up, so only using 9 colours shouldn't cause an error (I hope).

Could anyone help me plot just the first 18 points from a PCA of my whole data set?

My data frame looks similar to this in structure

TrialsMR
      Trees Bushes Shrubs Bands
JOHN1     1      4     18  BLUE
JOHN2     2      6     25  BLUE
CARL1     1      3     12 GREEN
CARL2     2      4     15 GREEN
GREG1     1      1     15   RED
GREG2     3     11     26   RED
MIKE1     1      7     19  PINK
MIKE2     1      1     25  PINK

where each band corresponds to a specific individual that has been tested twice.

CBio
  • 13
  • 3
  • 2
    What package are you using to get PCA? That is not base R. – G5W Jun 17 '18 at 21:34
  • 1
    You're going to need to add a [reproducible](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) example if people are going to be able to help with this – Conor Neilson Jun 17 '18 at 21:41

1 Answers1

1

You are using the wrong argument to specify individuals. Use select.ind to choose the individuals required, for eg.:

data(iris)                                                  # test data

If you want to rename your rows according to a specific grouping criteria for readily identifiable in a plot. For eg. let setosa lies in series starting with 1, something like in 100-199, similarly versicolor in 200-299 and virginica in 300-399. Do it before the PCA.

new_series <- c(101:150, 201:250, 301:350)                # there are 50 of each 
rownames(iris) <- new_series
R.pca <- prcomp(iris[,1:4],scale. = T)                    # pca

library(factoextra)

fviz_pca_ind(X= R.pca, labelsize = 4, pointsize = 1, 
             select.ind= list(name = new_series[1:120]),  # 120 out of 150 selected
             col.ind = iris$Species ,
             palette = c("blue", "red", "green" ))

enter image description here

Always refer to R documentation first before using a new function.

R documentation: fviz_pca {factoextra}

X
an object of class PCA [FactoMineR]; prcomp and princomp [stats]; dudi and pca [ade4]; expOutput/epPCA [ExPosition].

select.ind, select.var
a selection of individuals/variables to be drawn. Allowed values are NULL or a list containing the arguments name, cos2 or contrib

For your particular dummy data, this should do:

 R.pca <- prcomp(TrailsMR[,1:3], scale. = TRUE)

 fviz_pca_ind(X= R.pca, 
              select.ind= list(name = row.names(TrialsMR)[1:4]),  # 4 out of 8
              pointsize = 1, labelsize = 4,
              col.ind = TrialsMR$Bands,
              palette = c("blue", "green" )) + ylim(-1,1)

DD PCA: TrialsMR

Dummy Data:

TrialsMR <- read.table( text = "Trees Bushes Shrubs Bands
JOHN1     1      4     18  BLUE
JOHN2     2      6     25  BLUE
CARL1     1      3     12 GREEN
CARL2     2      4     15 GREEN
GREG1     1      1     15   RED
GREG2     3     11     26   RED
MIKE1     1      7     19  PINK
MIKE2     1      1     25  PINK", header = TRUE)
Mankind_008
  • 2,158
  • 2
  • 9
  • 15
  • Thanks for the response! My apologies that I didn't give the package I was using. This allows my code to run without any errors, but unfortunately it will not plot any of my points for some reason and I get a blank PC plot. 'fviz_pca_ind(R.pca, labelsize = 4, pointsize = 1, select.ind= list(name = 1:18), mean.point = FALSE, col.ind = TrialsMR$Bands, palette = c("red", "blue", "black", "cyan", "magenta", "yellow", "gray", "green3", "pink" ))' This is what I last ran that gives me the blank plot. If you had any additional suggestions for this issue it would be much appreciated! – CBio Jun 18 '18 at 14:26
  • should work fine. i updated code with palette. whats the class and structure of your `TrialsMR$Bands` column? – Mankind_008 Jun 18 '18 at 18:09
  • It creates the plot, but there are no points on it for some reason. `TrialsMR$Bands` is a factor that is the same for two individuals in the data I want to plot, giving 9 pairs. – CBio Jun 18 '18 at 18:17
  • I need a dummy data frame representing structures from your data to find out what exactly the problem you are facing. provide it in your post. – Mankind_008 Jun 18 '18 at 18:23
  • Could you show an example of how you would do this with the iris data in your answer if possible? I'm a little confused where/when I should use that? – CBio Jun 18 '18 at 19:26