I have xy data for gene expression in multiple samples. I wish to subset the first column so I can order the genes alphabetically and perform some other filtering.
> setwd("C:/Users/Will/Desktop/BIOL3063/R code assignment");
> df = read.csv('R-assignments-dataset.csv', stringsAsFactors = FALSE);
The first column is a list of gene names (A2M, AAAS, AACS etc.) and each column is a different tissue sample, thus showing the gene expression in each tissue sample.
The question being asked is "Sort the gene names alpahabetically (A-Z) and print out the first 20 gene names"
My thought process would be to subset the first column (gene names) and then perform order()
to sort alphabetically, after which I can use head()
to print the first 20.
However when I try
> genes <- df[1]
It simply subsets the first column that has data in it (TCGA-A6-2672_TissueA) rather than the one to its left.
Also
> genes <- df[,df$col1];
> genes;
data frame with 0 columns and 7065 rows
> order(genes);
integer(0)
Appears to create a list of gene names in R studio's viewer but I cannot perform any manipulation on it.
I am unable to correctly locate the first column in the data.frame, since it does not have a column header, and I also have the same problem when doing the same thing with row 1 (sample names) as well.
I'm a complete novice at R and this is part of an assignment I'm working on, it seems I'm missing something fundamental but I can not figure out what.
Cheers guys