I have the following data,
Sample_ID SNP_Name Genotype Phenotype CV.Group
1 AUS002 rs1028005 AA 1 4
2 AUS002 rs4788050 TC 1 4
3 AUS002 rs17143930 CC 1 4
4 AUS002 rs3920214 AA 1 4
5 AUS002 rs1862520 GG 1 4
6 AUS002 rs1461224 AC 1 4
which I did reshape it with the command below :
reshaped.data <- reshape(merged.data, timevar = "SNP_Name", idvar = c("Sample_ID","Phenotype","CV.Group"), direction = "wide")
It works fine by giving me what I want which to group it according to Sample_ID
and each of the variable will give three categories only (genotype data).
Sample_ID Phenotype CV.Group Genotype.rs1028005 Genotype.rs4788050
1 AUS002 1 4 AA TC
4039 AUS003 1 3 GG <NA>
7927 AUS004 1 4 AA TC
11965 AUS005 0 2 AG TT
16003 AUS007 0 2 AA TC
However, when I try to tabulate one of the variable it shows other level as well when it supposed to be only three (for example AA,AG and GG). Where it goes wrong?
table(reshaped.data$Phenotype,reshaped.data$Genotype.rs1028005)
-- AA AC AG AT CC CG GC GG TA TC TG TT
0 0 45 0 35 0 0 0 0 4 0 0 0 0
1 0 16 0 12 0 0 0 0 3 0 0 0 0