0

I am trying to plot results of a Principal Component Analysis here based off of data, with age as a color coded factor onto the plot (as shown in the image attached in the link)

I want to make the visual better, by grouping age ranges instead of a continuous color range. For example, I would want to have three colors group1 being age 0-25, group2 from 26 to 50 and group3 being 51+. Is there a way I can manipulate this code to do that? I tried looking online for this but I am still unsure how to do that for this code set.

# Plot the PCA with color coding of age
fviz_pca_ind(proteome_pr, geom.ind = "point", pointshape = 21,
  pointsize = 2,
  fill.ind = prot_subjectID$Age,
  col.ind = "black",
  pallette = "jco",
  label = "var", 
  col.var = "black",
  repel = TRUE,
  legend.title = "Age") + 
ggtitle("Proteome Data and Age Correlation") + theme(plot.title = element_text(hjust = 0.5))

Image of the generated figure

MrFlick
  • 195,160
  • 17
  • 277
  • 295
SpiderK
  • 55
  • 6
  • It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. You can use `cut()` to create groups for age. – MrFlick Jul 07 '21 at 20:03

1 Answers1

1

Following @MrFlick suggestion, you can use the following code:

fviz_pca_ind(proteome_pr, geom.ind = "point", pointshape = 21,
  pointsize = 2,
  fill.ind = cut(prot_subjectID$Age, breaks = 10), # Here you can create groups.
  col.ind = "black",
  pallette = "jco",
  label = "var", 
  col.var = "black",
  repel = TRUE,
  legend.title = "Age") + 
ggtitle("Proteome Data and Age Correlation") + theme(plot.title = element_text(hjust = 0.5))

But a minimal example could very helpful.

  • That was really helpful thank you! Just a follow up - using the breaks = 3, I was able to separate the data into three different distributions, but is there a way to specify the range within breaks to specific numerical values I want it to be? – SpiderK Jul 07 '21 at 20:16
  • 1
    Yes, you can just replace breaks with a vector as follows: cut(prot_subjectID$Age, breaks = c(1, 10, 20, 30, 40) – Enrique Del Callejo Canal Jul 07 '21 at 20:17