0

I have the following data,

   Sample_ID   SNP_Name Genotype Phenotype CV.Group
1     AUS002  rs1028005       AA         1        4
2     AUS002  rs4788050       TC         1        4
3     AUS002 rs17143930       CC         1        4
4     AUS002  rs3920214       AA         1        4
5     AUS002  rs1862520       GG         1        4
6     AUS002  rs1461224       AC         1        4

which I did reshape it with the command below :

reshaped.data <- reshape(merged.data, timevar = "SNP_Name", idvar = c("Sample_ID","Phenotype","CV.Group"), direction = "wide")

It works fine by giving me what I want which to group it according to Sample_ID and each of the variable will give three categories only (genotype data).

      Sample_ID Phenotype CV.Group Genotype.rs1028005 Genotype.rs4788050
1        AUS002         1        4                 AA                 TC
4039     AUS003         1        3                 GG               <NA>
7927     AUS004         1        4                 AA                 TC
11965    AUS005         0        2                 AG                 TT
16003    AUS007         0        2                 AA                 TC

However, when I try to tabulate one of the variable it shows other level as well when it supposed to be only three (for example AA,AG and GG). Where it goes wrong?

table(reshaped.data$Phenotype,reshaped.data$Genotype.rs1028005)

  -- AA AC AG AT CC CG GC GG TA TC TG TT
0  0 45  0 35  0  0  0  0  4  0  0  0  0
1  0 16  0 12  0  0  0  0  3  0  0  0  0
David Arenburg
  • 91,361
  • 17
  • 137
  • 196
Shima
  • 147
  • 2
  • 9

1 Answers1

0

I would assume this as a case of unused levels after reshaping the dataset. To remove those levels in the 'factor' variable, we can call either factor once more or use the function droplevels to drop those unused levels.

table(droplevels(reshaped.data$Phenotype),
                 droplevels(reshaped.data$Genotype.rs1028005))

Or just

 reshaped.data <- droplevels(reshaped.data)
 table(reshaped.data[,c('Phenotype', 'Genotype.rs1028005')])
akrun
  • 874,273
  • 37
  • 540
  • 662
  • 1
    Thank you very much. It works fine with your suggestion. The second command helps me to do it for the entire 4000 SNPs. – Shima Aug 12 '15 at 10:10