0

I have been given a data set with rows having the gene names and the column names being the genotype of the samples. Column names contain genotype of species and also their treatment (either A or B). How do I plot this in a PCA graph in R by their treatment?

Currently, I have made the matrix I need yet am completely stuck how to proceed. I have made a pca plot using:

library(tidyverse)
library(ggfortify)
df<- RGenes #RGenes being what i named my matrix
pca_res2<-prcomp(df,scale.=TRUE)
autopilot(pca_res2)
camille
  • 16,432
  • 18
  • 38
  • 60
  • 1
    [See here](https://stackoverflow.com/q/5963269/5325862) on making a reproducible example that is easier for folks to help with, including a sample of data we can work with – camille Aug 04 '21 at 20:36

1 Answers1

0

It would be much easier with sample data... but, for start, you have to put additional arguments into autopilot(). add your original data (data =df) and add colour (colour = treatment)... I don't have your data, so I assume you named the treatment variable "treatment".

FIRST SEPARATE YOUR COLUMN INTO 2 COLUMNS:

library(dplyr) #required library

#data
test <- data.frame(val= c("ABX_102K", "ABX_102L",  "ABO_100K" , "ABO_101L"))

#seperation 
test <- test %>% separate(val, into=c("val", "treatment"), sep = -1)

After this follow the instructions above.

Bloxx
  • 1,495
  • 1
  • 9
  • 21
  • Hi @Bloxx sorry I should have provided some sample data my bad. So my sample data is as follows (apologies am new so my formatting may be poor). here is an example: ABO_100K / ABO_101L / ABX_102K / ABX_102L these are example column headers and below them would be the rows. my issue is that each column is unique and i don't know how to colour and group by each treatment which is denoted by either K or L? – Charlie Aug 05 '21 at 13:26
  • in that case you have to first create new variable treatment... I will edit my first answer. – Bloxx Aug 05 '21 at 20:25