I have several processed microarray data (normalized, .txt files) from which I want to extract a list of 300 candidate genes (ILMN_IDs). I need in the output not only the gene names, but also the expression values and statistics info (already present in the original file). I have 2 dataframes:
normalizedData
with the identifiers (gene names) in the first column, named "Name".candidateGenes
with a single column named "Name", containing the identifiers.
I've tried
1).
all=normalizedData
subset=candidateGenes
x=all%in%subset
2).
all[which(all$gene_id %in% subset)] #(as suggested in other bioinf. forum)#,
but it returns a Dataframe with 0 columns and >4000 rows. This is not correct, since normalizedData has 24 columns and compare them, but I always get error.
The key is to be able to compare the first column of all ("Name") with subset. Here is the info:
> class(all)
> [1] "data.frame"
> dim(all)
> [1] 4312 24
> str(all)
> 'data.frame':4312 obs. of 24 variables:
$ Name: Factor w/ 4312 levels "ILMN_1651253": 3401..
$ meanbgt:num 0 ..
$ meanbgc: num ..
$ cvt: num 0.11 ..
$ cvc: num 0.23 ..
$ meant: num 4618 ..
$ stderrt: num 314.6 ..
$ meanc: num 113.8 ...
$ stderrc: num 15.6 ...
$ ratio: num 40.6 ...
$ ratiose: num 6.21 ...
$ logratio: num 5.34 ...
$ tp: num 1.3e-04 ...
$ t2p: num 0.00476 ...
$ wilcoxonp: num 0.0809 ...
$ tq: num 0.0256 ...
$ t2q: num 0.165 ...
$ wilcoxonq: num 0.346 ...
$ limmap: num 4.03e-10 ...
$ limmapa: num 4.34e-06 ...
$ SYMBOL: Factor w/ 3696 levels "","A2LD1",..
$ ENSEMBL: Factor w/ 3143 levels "ENSG00000000003",..
and here is the info about subset:
> class(subset)
[1] "data.frame"
> dim(subset)
>[1] 328 1
> str(subset) 'data.frame': 328 obs. of 1 variable:
$ V1: Factor w/ 328 levels "ILMN_1651429",..: 177 286 47 169 123 109 268 284 234 186 ...
I really appreciate your help!