0

I have a dataset that consists of rows of 896 SNP in a genome. Then I have a probability value of each SNP coming from a specific founder. The founders are grouped in groups of 8.

I want to check for each SNP probability if the maximum value has a tie, so check for the highest value in each group of 8 founders and see if it is repeated. I am only interested in 2-way ties. Then I want to get the value of the tie, the SNP in which it appears and which founders have it.

The problem is that I have the data given to me in a way that all founder groups are in columns repeated next to each other with a layout like this :

SNP P1_1 P1_2 P1_3 P1_4 P1_5 P1_6 P1_7 P1_8 ... Pn_1 Pn_2 Pn_3 Pn_4 Pn_5 Pn_6 Pn7_ Pn8
Asd 
FDA
FDG
GDE
GBD
SDF

I have worked it out by using loops, but in total for my dataset it takes 17 hours(!!!) to go through it, 896 SNPs and 600+ groups of 8 founders each.

Is there anyway to make it more efficient ? I was thinking something like melt() and having the final dataset in groups of 8 columns ? Would that be possible and even so would it make it more efficient ?

Here is what I have worked out so far:

require(reshape2)
A4 <- read.csv("sample.csv", header=TRUE)


SNP_count <-length(A4[,1])

A5 <- melt(A4)

#Give single index from 1-8 to each FOUNDER
A5[,4]<-rep(rep(seq(1,8,by=1),each=SNP_count),times=length(A5[,1])/SNP_count/8)

#Give a groupind number to each row so we know in which FOUNDER group it belongs
A5[,5]<-rep(seq(1,length(A5[,1])/SNP_count/8,by=1),each=SNP_count*8)

A5[,6]<-rep(seq(1,SNP_count))

colnames(A5) <- c("SNP","FOUNDER","PROB","FOUNDER_ID","GROUP_ID", "SNP_ID")

maxes<-NULL
x=1
i=1

for (x in 1:length(unique(A5$GROUP_ID))) { 

 for (i in 1:length(unique(A5$SNP_ID))) {
   e<-NULL

    e <- A5$PROB[A5$GROUP_ID==x & A5$SNP_ID==i] #save the rows of values here for each founder group

     if(length(which(e == max(e))) == 2){
       len <- length(which(e == max(e)))
       max1 <- max(e)
       founder <- as.character(A5[,2][A5$GROUP_ID==x & A5$SNP_ID==i])[1]
       SNP_INFO <- as.character(A5[,1][A5$GROUP_ID==x & A5$SNP_ID==i][1])
       ties <- length(which(e == max(e)))
       tie1 <- which(e==max(e))[1]
       tie2 <- which(e==max(e))[2]
       maxes <- rbind(maxes, c(e, SNP_INFO, ties, tie1, tie2, max1, founder))
       print(e)

}
 }
  }

From testing it seems that the line that makes it really slow is this :

e<-A5$PROB[A5$GROUP_ID==x & A5$SNP_ID==i]

Any help from anyone here would be amazingly appreciated.

dput results given below

structure(list(SNP = structure(c(571L, 690L, 7L, 204L, 317L, 
39L, 559L, 657L, 221L, 470L, 576L, 49L, 413L, 460L, 606L, 348L, 
111L, 129L, 30L, 290L, 291L, 767L, 96L, 172L, 38L, 48L, 54L, 
144L, 273L, 644L), .Label = c("BobWhite_c11298_512", "BobWhite_c11935_137", 
"BobWhite_c12428_371", "BobWhite_c1740_97", "BobWhite_c17852_511", 
"BobWhite_c17879_519", "BobWhite_c18256_105", "BobWhite_c18408_199", 
"BobWhite_c18593_955", "BobWhite_c1895_1953", "BobWhite_c21397_148", 
"BobWhite_c28950_147", "BobWhite_c29419_116", "BobWhite_c30009_285", 
"BobWhite_c30232_154", "BobWhite_c31500_144", "BobWhite_c34548_96", 
"BobWhite_c34866_232", "BobWhite_c35093_176", "BobWhite_c35303_192", 
"BobWhite_c42102_237", "BobWhite_c43213_184", "BobWhite_c43681_334", 
"BobWhite_c45534_535", "BobWhite_c46361_331", "BobWhite_c47722_613", 
"BobWhite_c5337_225", "BobWhite_c8340_511", "BobWhite_c9704_273", 
"BobWhite_c9961_402", "BobWhite_c9992_862a", "BobWhite_rep_c49102_169", 
"BobWhite_rep_c55551_53", "BobWhite_s65081_93", "BobWhite_s67516_159", 
"BS00000445_51", "BS00001478_51", "BS00003733_51", "BS00003932_51", 
"BS00003935_51", "BS00003956_51", "BS00003964_51", "BS00003971_51", 
"BS00006822_51", "BS00007502_51", "BS00009649_51", "BS00009657_51", 
"BS00009793_51", "BS00010059_51", "BS00010204_51", "BS00010531_51", 
"BS00010854_51", "BS00011231_51", "BS00011516_51", "BS00011612_51", 
"BS00012551_51", "BS00013584_51", "BS00017988_51", "BS00020459_51", 
"BS00021694_51", "BS00021871_51", "BS00021909_51", "BS00021981_51", 
"BS00022016_51", "BS00022029_51", "BS00022148_51", "BS00022159_51", 
"BS00022182_51", "BS00022256_51", "BS00022273_51", "BS00022368_51", 
"BS00022419_51", "BS00022424_51", "BS00022459_51", "BS00022516_51", 
"BS00022528_51", "BS00022586_51", "BS00022703_51", "BS00022746_51", 
"BS00022804_51", "BS00022845_51", "BS00022882_51", "BS00022884_51", 
"BS00022908_51", "BS00022968_51", "BS00022989_51", "BS00023026_51", 
"BS00023028_51", "BS00023189_51", "BS00023222_51", "BS00023337_51", 
"BS00023511_51", "BS00025191_51", "BS00026189_51", "BS00026396_51", 
"BS00027516_51", "BS00028660_51", "BS00029361_51", "BS00029569_51", 
"BS00030652_51", "BS00030876_51", "BS00032025_51", "BS00032524_51", 
"BS00032525_51", "BS00033658_51", "BS00034334_51", "BS00035364_51", 
"BS00035576_51", "BS00036168_51", "BS00036492_51", "BS00037189_51", 
"BS00037400_51", "BS00037537_51", "BS00038663_51", "BS00038664_51", 
"BS00039357_51", "BS00039498_51", "BS00040798_51", "BS00041121_51", 
"BS00041462_51", "BS00041742_51", "BS00042191_51", "BS00045327_51", 
"BS00047668_51", "BS00047836_51", "BS00048031_51", "BS00048633_51", 
"BS00048757_51", "BS00049420_51", "BS00049636_51", "BS00049637_51", 
"BS00049977_51", "BS00055211_51", "BS00056089_51", "BS00057444_51", 
"BS00057445_51", "BS00057523_51", "BS00057524_51", "BS00060029_51", 
"BS00060391_51", "BS00061173_51", "BS00061179_51", "BS00062645_51", 
"BS00062673_51", "BS00062808_51", "BS00063163_51", "BS00063300_51", 
"BS00063696_51", "BS00064039_51", "BS00064422_51", "BS00064487_51", 
"BS00064692_51", "BS00064703_51", "BS00065468_51", "BS00065673_51", 
"BS00065734_51", "BS00065840_51", "BS00065932_51", "BS00065956_51", 
"BS00066230_51", "BS00066475_51", "BS00066585_51", "BS00066714_51", 
"BS00066800_51", "BS00067190_51", "BS00067215_51", "BS00067216_51", 
"BS00067224_51", "BS00067499_51", "BS00067711_51", "BS00067748_51", 
"BS00067777_51", "BS00067940_51", "BS00067997_51", "BS00068094_51", 
"BS00068508_51", "BS00068520_51", "BS00069355_51", "BS00070464_51", 
"BS00070511_51", "BS00070870_51", "BS00071422_51", "BS00071823_51", 
"BS00071948_51", "BS00072186_51", "BS00072296_51", "BS00073009_51", 
"BS00073525_51", "BS00073854_51", "BS00074617_51", "BS00074926_51", 
"BS00075119_51", "BS00075515_51", "BS00075598_51", "BS00076772_51", 
"BS00077819_51", "BS00078430_51", "BS00078669_51", "BS00080546_51", 
"BS00080879_51", "BS00081418_51", "BS00081475_51", "BS00081610_51", 
"BS00081981_51", "BS00082982_51", "BS00083279_51", "BS00083398_51", 
"BS00084158_51", "BS00084159_51", "BS00085967_51", "BS00086050_51", 
"BS00086051_51", "BS00086247_51", "BS00088755_51", "BS00088756_51", 
"BS00089400_51", "BS00090225_51", "BS00090405_51", "BS00091002_51", 
"BS00091887_51", "BS00092728_51", "BS00092735_51", "BS00093252_51", 
"BS00093377_51", "BS00093871_51", "BS00093889_51", "BS00094366_51", 
"BS00094523_51", "BS00095228_51", "BS00095423_51", "BS00096642_51", 
"BS00097265_51", "BS00098840_51", "BS00099980_51", "BS00100626_51", 
"BS00101363_51", "BS00101401_51", "BS00104401_51", "BS00105963_51", 
"BS00105964_51", "BS00106008_51", "BS00106932_51", "BS00107444_51", 
"BS00109084_51", "BS00109700_51", "BS00109935_51", "BS00110266_51", 
"BS00110405_51", "BS00110452_51", "BS00110550_51", "BS00110564_51", 
"BS00110638_51a", "CAP11_c1022_117", "CAP11_c1022_66", "CAP11_c3472_60", 
"CAP11_c6193_232", "CAP11_rep_c6920_161", "CAP12_c1840_108", 
"CAP12_c1860_280", "CAP7_c1238_51", "CAP7_c2141_129", "CAP7_c3178_52", 
"CAP7_c3367_68", "CAP7_c915_121", "CAP7_rep_c12537_81", "CAP8_c1393_327", 
"CAP8_c359_95", "CAP8_c665_242", "CAP8_c665_409", "CAP8_rep_c3652_80", 
"CAP8_rep_c6382_237", "D_contig22919_290", "D_F5XZDLF02GN47Z_208", 
"D_GBB4FNX02FX302_98", "Ex_c104581_457", "Ex_c15087_564", "Ex_c17586_108", 
"Ex_c18484_2026", "Ex_c18484_2048", "Ex_c24554_1583", "Ex_c45438_377", 
"Ex_c5858_1681", "Ex_c5858_953", "Ex_c66357_866", "Ex_c67794_487", 
"Ex_c8871_1318", "Ex_c9685_1264", "Excalibur_c10079_1585", "Excalibur_c10383_432", 
"Excalibur_c11079_749", "Excalibur_c11079_923", "Excalibur_c11505_155", 
"Excalibur_c12875_864", "Excalibur_c13242_1178", "Excalibur_c13709_2568", 
"Excalibur_c1529_1644", "Excalibur_c18089_1116", "Excalibur_c20250_313", 
"Excalibur_c21051_515", "Excalibur_c24123_1597", "Excalibur_c24123_165", 
"Excalibur_c2419_531", "Excalibur_c24354_465", "Excalibur_c24829_189", 
"Excalibur_c25239_283", "Excalibur_c26754_433", "Excalibur_c29205_537", 
"Excalibur_c29600_173", "Excalibur_c31571_136", "Excalibur_c33545_134", 
"Excalibur_c39248_485", "Excalibur_c39309_171", "Excalibur_c39508_88", 
"Excalibur_c39808_453", "Excalibur_c40694_473", "Excalibur_c41477_1272", 
"Excalibur_c42951_136", "Excalibur_c42978_149", "Excalibur_c46082_440", 
"Excalibur_c47078_512", "Excalibur_c47907_517", "Excalibur_c49743_97", 
"Excalibur_c5097_1468", "Excalibur_c52446_519", "Excalibur_c5561_1013", 
"Excalibur_c56240_176", "Excalibur_c60452_196", "Excalibur_c60581_62", 
"Excalibur_c62042_175", "Excalibur_c64302_103", "Excalibur_c6501_477", 
"Excalibur_c9000_209", "Excalibur_c91154_164", "Excalibur_c91154_175", 
"Excalibur_c96921_206", "Excalibur_c9811_131", "Excalibur_c98205_83", 
"Excalibur_rep_c103091_266", "Excalibur_rep_c103408_632a", "Excalibur_rep_c104884_1325", 
"Excalibur_rep_c104884_417", "Excalibur_rep_c105085_102", "Excalibur_rep_c105978_544", 
"Excalibur_rep_c110501_525", "Excalibur_rep_c112985_337", "Excalibur_rep_c116073_259", 
"Excalibur_rep_c67269_496", "Excalibur_rep_c69004_1008", "Excalibur_s113043_59", 
"GENE_1464_73", "GENE_1549_110", "GENE_1606_307", "GENE_1634_405", 
"GENE_1661_228", "GENE_1674_631", "GENE_1820_661", "GENE_1826_718", 
"GENE_1838_79", "GENE_2052_186", "GENE_2087_300", "GENE_3343_183", 
"GENE_3665_61", "GENE_3939_576", "GENE_3939_653", "GENE_4252_246", 
"GENE_4795_75", "IAAV1155", "IAAV1334", "IAAV1410", "IAAV1523", 
"IAAV2646", "IAAV3826", "IAAV3900", "IAAV4190", "IAAV4286", "IAAV4781", 
"IAAV488", "IAAV5030", "IAAV5207", "IAAV5370", "IAAV5507", "IAAV5729", 
"IAAV5821", "IAAV5984", "IAAV6317", "IAAV6474", "IAAV6676", "IAAV6974", 
"IAAV7454", "IAAV7930", "IAAV8170", "IAAV8218", "IAAV8351", "IAAV8407", 
"IAAV8768", "IAAV8924", "IAAV902", "IAAV9044", "IAAV9068", "IACX11403", 
"IACX2728", "IACX2831", "IACX464", "IACX5909", "IACX5968", "IACX6054", 
"IACX6065", "IACX6092", "IACX6277", "IACX8519", "Jagger_c367_427", 
"Jagger_c5842_90", "Jagger_c6722_104", "Jagger_c765_61", "Jagger_c791_62", 
"JD_c1187_1398", "JG_c3077_225", "Ku_c103671_362", "Ku_c12191_1202", 
"Ku_c13307_1116", "Ku_c14313_1194", "Ku_c17569_905", "Ku_c18096_552", 
"Ku_c19456_645", "Ku_c26872_269", "Ku_c28597_514", "Ku_c41007_116", 
"Ku_c4231_940", "Ku_c56370_1155", "Ku_c64839_240", "Ku_c68484_1276", 
"Ku_c69970_624", "Ku_c69970_631", "Ku_c70534_1215", "Ku_c73010_143", 
"Kukri_c10210_1387", "Kukri_c10210_1700", "Kukri_c10511_94", 
"Kukri_c10751_1031", "Kukri_c10751_158", "Kukri_c10751_264", 
"Kukri_c10751_860", "Kukri_c10977_990", "Kukri_c11709_874", "Kukri_c12079_204", 
"Kukri_c12212_182", "Kukri_c13793_1152", "Kukri_c14012_358", 
"Kukri_c14029_117", "Kukri_c14971_367", "Kukri_c15950_244", "Kukri_c17313_667", 
"Kukri_c17966_561", "Kukri_c20889_526", "Kukri_c2123_1254", "Kukri_c23388_695", 
"Kukri_c23833_181", "Kukri_c2454_59", "Kukri_c25564_185", "Kukri_c28650_111", 
"Kukri_c28917_96", "Kukri_c30370_79", "Kukri_c31546_66", "Kukri_c33640_640", 
"Kukri_c34195_357", "Kukri_c34845_242", "Kukri_c35426_507", "Kukri_c36418_392", 
"Kukri_c3673_2442", "Kukri_c41361_186", "Kukri_c4324_74", "Kukri_c4568_1708", 
"Kukri_c46833_105", "Kukri_c47643_920", "Kukri_c49927_151", "Kukri_c51247_322", 
"Kukri_c51666_401", "Kukri_c5459_334", "Kukri_c54593_543", "Kukri_c54729_181", 
"Kukri_c55381_67", "Kukri_c62467_362", "Kukri_c6288_364", "Kukri_c63259_151", 
"Kukri_c63361_97", "Kukri_c64268_101", "Kukri_c68006_282", "Kukri_c80104_809", 
"Kukri_c82097_197", "Kukri_c82117_279", "Kukri_c8242_730", "Kukri_c8400_2315", 
"Kukri_c8465_54", "Kukri_c97654_90", "Kukri_rep_c102953_304", 
"Kukri_rep_c103783_1380", "Kukri_rep_c105819_114", "Kukri_rep_c107195_111", 
"Kukri_rep_c109051_116", "Kukri_rep_c110312_376", "Kukri_rep_c112061_617", 
"Kukri_rep_c114028_94", "Kukri_rep_c69028_139", "Kukri_rep_c69028_1398", 
"Kukri_rep_c69028_347", "Kukri_rep_c69970_717", "Kukri_rep_c70441_132", 
"Kukri_rep_c70479_411", "Kukri_rep_c71916_1548", "Kukri_rep_c73936_154", 
"Kukri_rep_c86903_184", "Kukri_rep_c87640_135", "Kukri_rep_c89183_282", 
"Kukri_rep_c89509_83", "Kukri_s110037_62", "Kukri_s117946_404", 
"Ra_c11263_2353", "Ra_c14565_1056", "Ra_c1619_250", "Ra_c1619_432", 
"Ra_c20211_1415", "Ra_c23771_496", "Ra_c38505_544", "Ra_c38505_555", 
"Ra_c3994_598", "Ra_c42858_503", "Ra_c42858_91", "Ra_c48730_581", 
"Ra_c5515_1723", "Ra_c5515_2396", "Ra_c5515_2469", "Ra_c70087_485", 
"Ra_c72106_227", "Ra_c73278_1234", "Ra_c88203_1055", "Ra_c88203_455", 
"Ra_c88203_468", "RAC875_c10194_673", "RAC875_c10194_910", "RAC875_c10286_1247", 
"RAC875_c10430_672", "RAC875_c14684_1128", "RAC875_c15109_106", 
"RAC875_c1619_120", "RAC875_c1666_126", "RAC875_c17907_181", 
"RAC875_c18570_893", "RAC875_c19138_402", "RAC875_c19570_671", 
"RAC875_c20134_535", "RAC875_c22333_504", "RAC875_c23470_220", 
"RAC875_c2464_274", "RAC875_c24982_202", "RAC875_c26514_127", 
"RAC875_c27323_867", "RAC875_c2786_1097", "RAC875_c30443_966", 
"RAC875_c3141_214", "RAC875_c31572_251", "RAC875_c31572_281", 
"RAC875_c3427_734", "RAC875_c34649_380", "RAC875_c35074_452", 
"RAC875_c36922_829", "RAC875_c371_251", "RAC875_c37118_595", 
"RAC875_c39728_187", "RAC875_c45016_79", "RAC875_c46323_1996", 
"RAC875_c46403_277", "RAC875_c47550_437", "RAC875_c47976_291", 
"RAC875_c50787_113", "RAC875_c50864_1921", "RAC875_c51781_238", 
"RAC875_c52195_324", "RAC875_c5310_1729", "RAC875_c53864_310", 
"RAC875_c55313_89", "RAC875_c58159_521", "RAC875_c6445_275", 
"RAC875_c66845_466", "RAC875_c66892_58", "RAC875_c67309_634", 
"RAC875_c67998_96", "RAC875_c68056_81", "RAC875_c75663_115", 
"RAC875_c787_431", "RAC875_c79551_167", "RAC875_c8010_155", "RAC875_c8174_1034", 
"RAC875_c82493_67", "RAC875_c842_1476", "RAC875_c842_1516", "RAC875_c99055_69", 
"RAC875_rep_c105428_270", "RAC875_rep_c108615_475", "RAC875_rep_c109228_400", 
"RAC875_rep_c109288_122", "RAC875_rep_c109433_782", "RAC875_rep_c111243_125", 
"RAC875_rep_c111698_383", "RAC875_rep_c111952_436", "RAC875_rep_c117959_132", 
"RAC875_rep_c69730_522", "RAC875_rep_c70746_582", "RAC875_rep_c75557_177", 
"RAC875_rep_c75768_245", "RAC875_rep_c77065_1532", "RAC875_rep_c77148_311", 
"RAC875_rep_c81701_424", "RAC875_rep_c90531_283", "RAC875_s120038_124", 
"RFL_Contig102_119", "RFL_Contig1175_354", "RFL_Contig1435_886", 
"RFL_Contig1488_671", "RFL_Contig1798_1606", "RFL_Contig2394_439", 
"RFL_Contig4282_1420", "RFL_Contig4399_956", "RFL_Contig4403_1034", 
"RFL_Contig4921_2420", "RFL_Contig497_1114", "RFL_Contig5153_2667", 
"RFL_Contig5153_958", "RFL_Contig5871_1650", "TA001038_0975", 
"TA001383_0516", "TA002103_1262", "TA002702_0616", "TA003179_0872", 
"TA003281_2379", "TA003550_0145", "TA003589_0518", "TA005161_0899", 
"TA005199_0585", "TA005216_0272", "TA015264_0958", "Tdurum_contig10307_375", 
"Tdurum_contig11121_739", "Tdurum_contig11121_834", "Tdurum_contig12008_842", 
"Tdurum_contig13011_381", "Tdurum_contig13548_158", "Tdurum_contig25520_363", 
"Tdurum_contig27982_568", "Tdurum_contig28699_54", "Tdurum_contig31235_99", 
"Tdurum_contig33100_127", "Tdurum_contig42495_389", "Tdurum_contig45726_1116", 
"Tdurum_contig5009_349", "Tdurum_contig5009_392", "Tdurum_contig5009_735", 
"Tdurum_contig50376_375", "Tdurum_contig50392_1355", "Tdurum_contig50577_620", 
"Tdurum_contig50617_713", "Tdurum_contig51995_377", "Tdurum_contig51995_474", 
"Tdurum_contig55443_1361", "Tdurum_contig56373_348", "Tdurum_contig56731_335", 
"Tdurum_contig57139_318", "Tdurum_contig57693_581", "Tdurum_contig57753_138", 
"Tdurum_contig58326_467", "Tdurum_contig59531_892", "Tdurum_contig59531_914", 
"Tdurum_contig59585_576", "Tdurum_contig59585_656", "Tdurum_contig59782_86", 
"Tdurum_contig61299_55", "Tdurum_contig62502_90", "Tdurum_contig62557_263", 
"Tdurum_contig63129_238", "Tdurum_contig67686_1149", "Tdurum_contig67686_1204", 
"Tdurum_contig67686_633", "Tdurum_contig67686_792", "Tdurum_contig67686_851", 
"Tdurum_contig68305_703", "Tdurum_contig68305_796", "Tdurum_contig68855_91", 
"Tdurum_contig697_73", "Tdurum_contig76105_124", "Tdurum_contig76105_201", 
"Tdurum_contig83209_316", "Tdurum_contig83663_371", "Tdurum_contig91865_242", 
"Tdurum_contig93570_848", "Tdurum_contig9912_216", "Tdurum_contig9912_228", 
"Tdurum_contig9912_451", "tplb0050h15_1287", "tplb0055d10_1024", 
"wsnp_BE426222A_Ta_2_1", "wsnp_BE443568A_Ta_2_1", "wsnp_BE443568A_Ta_2_2", 
"wsnp_BE443995B_Ta_2_2", "wsnp_BE490613A_Ta_2_1", "wsnp_BE494474A_Ta_2_2", 
"wsnp_BE604885A_Ta_2_1", "wsnp_BE604885A_Ta_2_2", "wsnp_BF293133A_Ta_2_2", 
"wsnp_BF429272A_Ta_2_1", "wsnp_BG262734A_Ta_2_3", "wsnp_BG262734A_Ta_2_7", 
"wsnp_BM137927A_Ta_2_1", "wsnp_BQ171931A_Ta_2_1", "wsnp_CAP11_c2438_1258747", 
"wsnp_CAP11_c318_261649", "wsnp_CAP11_rep_c4157_1965583", "wsnp_CAP11_rep_c4226_1995152", 
"wsnp_CAP12_c15_9559", "wsnp_CAP12_rep_c8867_3720285", "wsnp_CAP8_c6939_3242530", 
"wsnp_Ex_c10014_16477392", "wsnp_Ex_c10272_16842803", "wsnp_Ex_c10630_17338753", 
"wsnp_Ex_c10667_17387885", "wsnp_Ex_c11039_17902115", "wsnp_Ex_c11085_17973016", 
"wsnp_Ex_c11229_18163892", "wsnp_Ex_c11397_18400400", "wsnp_Ex_c1141_2191485", 
"wsnp_Ex_c1149_2206471", "wsnp_Ex_c11807_18960045", "wsnp_Ex_c12269_19597341", 
"wsnp_Ex_c12269_19597415", "wsnp_Ex_c12341_19693570", "wsnp_Ex_c12354_19711297", 
"wsnp_Ex_c12750_20243224", "wsnp_Ex_c12948_20511479", "wsnp_Ex_c1335_2556442", 
"wsnp_Ex_c13802_21639096", "wsnp_Ex_c14202_22144844", "wsnp_Ex_c14202_22145136", 
"wsnp_Ex_c14202_22145805", "wsnp_Ex_c14340_22315611", "wsnp_Ex_c14400_22381382", 
"wsnp_Ex_c14400_22381548", "wsnp_Ex_c14420_22402673", "wsnp_Ex_c15100_23284023", 
"wsnp_Ex_c15269_23491104", "wsnp_Ex_c15269_23492289", "wsnp_Ex_c1533_2930233", 
"wsnp_Ex_c1538_2937905", "wsnp_Ex_c15475_23756906", "wsnp_Ex_c15475_23757972", 
"wsnp_Ex_c15674_24004513", "wsnp_Ex_c15674_24005648", "wsnp_Ex_c16079_24507688", 
"wsnp_Ex_c1660_3159173", "wsnp_Ex_c16615_25147492", "wsnp_Ex_c18596_27457344", 
"wsnp_Ex_c18637_27508578", "wsnp_Ex_c20273_29326769", "wsnp_Ex_c2148_4035913", 
"wsnp_Ex_c22435_31629303", "wsnp_Ex_c22888_32105519", "wsnp_Ex_c24085_33332723", 
"wsnp_Ex_c24731_33983680", "wsnp_Ex_c2502_4675968", "wsnp_Ex_c25132_34396655", 
"wsnp_Ex_c26887_36107413", "wsnp_Ex_c27150_36365659", "wsnp_Ex_c28679_37784954", 
"wsnp_Ex_c28930_38008757", "wsnp_Ex_c29742_38738725", "wsnp_Ex_c32003_40728918", 
"wsnp_Ex_c3478_6369892", "wsnp_Ex_c35073_43285821", "wsnp_Ex_c35457_43602830", 
"wsnp_Ex_c361_707953", "wsnp_Ex_c361_708712", "wsnp_Ex_c37208_45002588", 
"wsnp_Ex_c41283_48140956", "wsnp_Ex_c41283_48141201", "wsnp_Ex_c47763_52874806", 
"wsnp_Ex_c47907_52974924", "wsnp_Ex_c48136_53140385", "wsnp_Ex_c5047_8963671", 
"wsnp_Ex_c51776_55603135", "wsnp_Ex_c53364_56625806", "wsnp_Ex_c55051_57706127", 
"wsnp_Ex_c5623_9891427", "wsnp_Ex_c5623_9891516", "wsnp_Ex_c5623_9891584", 
"wsnp_Ex_c56525_58609595", "wsnp_Ex_c57322_59083238", "wsnp_Ex_c57322_59084809", 
"wsnp_Ex_c57322_59084950", "wsnp_Ex_c5929_10402147", "wsnp_Ex_c5997_10512308", 
"wsnp_Ex_c60462_60905848", "wsnp_Ex_c6217_10848574", "wsnp_Ex_c6833_11782875", 
"wsnp_Ex_c742_1458743", "wsnp_Ex_c763_1503467", "wsnp_Ex_c8409_14170476", 
"wsnp_Ex_c88767_80001420", "wsnp_Ex_c8884_14841846", "wsnp_Ex_c9145_15214903", 
"wsnp_Ex_c943_1808232", "wsnp_Ex_c943_1808577", "wsnp_Ex_c9458_15679797", 
"wsnp_Ex_c9468_15696542", "wsnp_Ex_c9483_15722127", "wsnp_Ex_c9510_15761235", 
"wsnp_Ex_rep_c101340_86719115", "wsnp_Ex_rep_c101340_86719239", 
"wsnp_Ex_rep_c101942_87217430", "wsnp_Ex_rep_c102478_87635370", 
"wsnp_Ex_rep_c106152_90334299", "wsnp_Ex_rep_c108072_91444417", 
"wsnp_Ex_rep_c66357_64540369", "wsnp_Ex_rep_c66357_64540428", 
"wsnp_Ex_rep_c66685_65003087", "wsnp_Ex_rep_c66685_65003254", 
"wsnp_Ex_rep_c66685_65003625", "wsnp_Ex_rep_c66867_65267909", 
"wsnp_Ex_rep_c66907_65324299", "wsnp_Ex_rep_c67349_65914945", 
"wsnp_Ex_rep_c67460_66057400", "wsnp_Ex_rep_c67588_66227926", 
"wsnp_Ex_rep_c67635_66291944", "wsnp_Ex_rep_c67635_66292308", 
"wsnp_Ex_rep_c67635_66292689", "wsnp_Ex_rep_c67727_66398596", 
"wsnp_Ex_rep_c67786_66472568", "wsnp_Ex_rep_c69034_67934852", 
"wsnp_Ex_rep_c69034_67935465", "wsnp_Ex_rep_c69314_68244036", 
"wsnp_Ex_rep_c69314_68244502", "wsnp_Ex_rep_c69577_68526990", 
"wsnp_Ex_rep_c69752_68711460", "wsnp_Ex_rep_c69864_68823765", 
"wsnp_Ex_rep_c69864_68824236", "wsnp_Ex_rep_c69864_68824319", 
"wsnp_JD_c1187_1731186", "wsnp_JD_c29019_23208078", "wsnp_JD_c3034_4017676", 
"wsnp_JD_c5699_6859527", "wsnp_JD_c7532_8615717", "wsnp_JD_c8207_9234643", 
"wsnp_JD_c9434_10274598a", "wsnp_JD_c968_1427139", "wsnp_Ku_c10468_17301042", 
"wsnp_Ku_c11052_18135847", "wsnp_Ku_c18497_27803432", "wsnp_Ku_c19456_28944589", 
"wsnp_Ku_c23901_33846711", "wsnp_Ku_c30545_40369365", "wsnp_Ku_c32404_42016343", 
"wsnp_Ku_c3286_6111360", "wsnp_Ku_c38911_47455674", "wsnp_Ku_c38911_47455924", 
"wsnp_Ku_c40218_48484410", "wsnp_Ku_c4568_8243646", "wsnp_Ku_c4568_8243775", 
"wsnp_Ku_c4886_8753646", "wsnp_Ku_c5243_9344536", "wsnp_Ku_c5359_9531713", 
"wsnp_Ku_c5378_9559013", "wsnp_Ku_c7811_13387117", "wsnp_Ku_c8400_14280021", 
"wsnp_Ku_c9433_15811664", "wsnp_Ku_rep_c104691_91086871", "wsnp_Ku_rep_c69970_69476502", 
"wsnp_Ku_rep_c70479_70079622", "wsnp_Ku_rep_c71761_71496470", 
"wsnp_Ra_c10669_17515792", "wsnp_Ra_c11291_18338838", "wsnp_Ra_c132_291198", 
"wsnp_Ra_c16053_24607526", "wsnp_Ra_c16278_24893033", "wsnp_Ra_c16846_25598885", 
"wsnp_Ra_c19079_28210829", "wsnp_Ra_c19079_28210937", "wsnp_Ra_c27831_37346894", 
"wsnp_Ra_c29280_38672141", "wsnp_Ra_c35889_44345459", "wsnp_Ra_c44141_50623811", 
"wsnp_Ra_c4858_8709000", "wsnp_Ra_c7280_12576178", "wsnp_Ra_rep_c106523_90273922", 
"wsnp_Ra_rep_c72670_70836439a", "wsnp_RFL_Contig2011_1216801", 
"wsnp_RFL_Contig2767_2518373", "wsnp_RFL_Contig3866_4228783", 
"wsnp_RFL_Contig429_4978628", "wsnp_RFL_Contig4734_5671036", 
"wsnp_RFL_Contig4814_5829093"), class = "factor"), MEL_004.3_haplotype1 = c(0.00033, 
1e-05, 0, 0, 0, 0, 0, 0, 0.00013, 0.00013, 0.00014, 0.00022, 
0.00022, 0.00022, 0.00022, 0.00027, 0.00027, 0.00023, 0.00023, 
0.00023, 0.00022, 0.00022, 0.00015, 0.00014, 3e-05, 3e-05, 2e-05, 
1e-05, 1e-05, 0), MEL_004.3_haplotype2 = c(0.00033, 1e-05, 0, 
0, 0, 0, 0, 0, 0.00013, 0.00013, 0.00014, 0.00022, 0.00022, 0.00022, 
0.00022, 0.00027, 0.00027, 0.00023, 0.00023, 0.00023, 0.00022, 
0.00022, 0.00015, 0.00014, 3e-05, 3e-05, 2e-05, 1e-05, 1e-05, 
0), MEL_004.3_haplotype3 = c(0.00033, 1e-05, 0, 0, 0, 0, 0, 0, 
0.00013, 0.00013, 0.00014, 0.00022, 0.00022, 0.00022, 0.00022, 
0.00027, 0.00027, 0.00023, 0.00023, 0.00023, 0.00022, 0.00022, 
0.00015, 0.00014, 3e-05, 3e-05, 2e-05, 1e-05, 1e-05, 0), MEL_004.3_haplotype4 = c(0.00033, 
1e-05, 0, 0, 0, 4e-05, 4e-05, 4e-05, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.00021), MEL_004.3_haplotype5 = c(0.00033, 
1e-05, 0, 0, 0, 4e-05, 4e-05, 4e-05, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), MEL_004.3_haplotype6 = c(0.96403, 
0.96583, 0.96858, 0.97124, 0.97155, 0.97417, 0.97446, 0.97749, 
0.98022, 0.9804, 0.98058, 0.98303, 0.98316, 0.9833, 0.98345, 
0.98632, 0.98946, 0.99245, 0.99259, 0.99272, 0.99286, 0.993, 
0.9958, 0.9961, 0.99927, 0.99939, 0.99948, 0.9996, 0.99972, 0.99973
), MEL_004.3_haplotype7 = c(0, 0, 0, 0, 0, 4e-05, 4e-05, 4e-05, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
2e-05), MEL_004.3_haplotype8 = c(0.03431, 0.03407, 0.03141, 0.02873, 
0.02844, 0.0257, 0.0254, 0.02237, 0.0194, 0.01919, 0.01897, 0.01631, 
0.01616, 0.01602, 0.01586, 0.01286, 0.00971, 0.00683, 0.0067, 
0.00658, 0.00646, 0.00633, 0.00374, 0.00346, 0.00065, 0.00055, 
0.00045, 0.00034, 0.00024, 3e-05), MEL_005.3_haplotype1 = c(0.21684, 
0.21698, 0.21721, 0.21746, 0.21741, 0.21704, 0.21694, 0.21582, 
0.21465, 0.21458, 0.21448, 0.21345, 0.2134, 0.21333, 0.21328, 
0.21213, 0.21094, 0.20988, 0.20983, 0.20977, 0.20973, 0.20969, 
0.20874, 0.20864, 0.20762, 0.20758, 0.20754, 0.20752, 0.20747, 
0.20744), MEL_005.3_haplotype2 = c(0.03821, 0.03799, 0.03566, 
0.03332, 0.03305, 0.0306, 0.03031, 0.02743, 0.02462, 0.02441, 
0.02421, 0.02168, 0.02154, 0.0214, 0.02124, 0.01837, 0.01534, 
0.01254, 0.01241, 0.01229, 0.01218, 0.01205, 0.00953, 0.00925, 
0.00652, 0.0064, 0.0063, 0.0062, 0.0061, 0.00599), MEL_005.3_haplotype3 = c(0.72948, 
0.73078, 0.73836, 0.74606, 0.74668, 0.75236, 0.75276, 0.75674, 
0.76033, 0.76059, 0.76085, 0.76416, 0.76434, 0.76454, 0.76474, 
0.76859, 0.77275, 0.77666, 0.77684, 0.777, 0.77718, 0.77736, 
0.78097, 0.78137, 0.7854, 0.78554, 0.7857, 0.78584, 0.78599, 
0.78614), MEL_005.3_haplotype4 = c(0.0089, 0.0086, 0.00586, 0.00311, 
0.0028, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0), MEL_005.3_haplotype5 = c(0.00588, 0.00558, 
0.00281, 1e-05, 2e-05, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), MEL_005.3_haplotype6 = c(0.00033, 
1e-05, 0, 0, 0, 0, 0, 0, 0.00018, 0.00019, 2e-04, 3e-04, 3e-04, 
0.00031, 0.00032, 0.00038, 0.00038, 0.00033, 0.00032, 0.00031, 
0.00031, 3e-04, 0.00021, 2e-04, 4e-05, 2e-05, 2e-05, 1e-05, 1e-05, 
0), MEL_005.3_haplotype7 = c(3e-05, 5e-05, 6e-05, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0), MEL_005.3_haplotype8 = c(0.00033, 1e-05, 0, 0, 0, 0, 0, 
0, 0.00022, 0.00024, 0.00025, 4e-04, 0.00041, 0.00042, 0.00042, 
0.00053, 6e-04, 6e-04, 6e-04, 6e-04, 6e-04, 6e-04, 0.00054, 0.00053, 
0.00043, 0.00043, 0.00042, 0.00042, 0.00041, 0.00041), MEL_005.4_haplotype1 = c(0.16705, 
0.16711, 0.16662, 0.16614, 0.16603, 0.16508, 0.16492, 0.16332, 
0.16169, 0.16156, 0.16144, 0.16, 0.15991, 0.15983, 0.15975, 0.15811, 
0.1564, 0.15485, 0.15479, 0.15472, 0.15464, 0.15458, 0.1532, 
0.15304, 0.15154, 0.15148, 0.15142, 0.15138, 0.15132, 0.15126
), MEL_005.4_haplotype2 = c(0.03823, 0.03801, 0.03569, 0.03335, 
0.03308, 0.0306, 0.03032, 0.02746, 0.02463, 0.02443, 0.02423, 
0.0217, 0.02156, 0.02141, 0.02126, 0.01837, 0.01535, 0.01256, 
0.01243, 0.0123, 0.01218, 0.01206, 0.00955, 0.00925, 0.0065, 
0.00641, 0.00632, 0.0062, 0.0061, 0.006), MEL_005.4_haplotype3 = c(0.77923, 
0.78063, 0.78893, 0.79736, 0.79805, 0.80432, 0.80476, 0.80922, 
0.81328, 0.81357, 0.81386, 0.8176, 0.81781, 0.81801, 0.81825, 
0.82258, 0.82726, 0.83167, 0.83187, 0.83205, 0.83224, 0.83244, 
0.8365, 0.83695, 0.84147, 0.84164, 0.84181, 0.84198, 0.84215, 
0.8423), MEL_005.4_haplotype4 = c(0.00889, 0.00861, 0.00587, 
0.00312, 0.0028, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0), MEL_005.4_haplotype5 = c(0.00589, 
0.00559, 0.00282, 1e-05, 2e-05, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), MEL_005.4_haplotype6 = c(0.00033, 
1e-05, 0, 0, 0, 0, 0, 0, 0.00018, 0.00019, 2e-04, 3e-04, 3e-04, 
0.00031, 0.00031, 0.00037, 0.00037, 0.00032, 0.00031, 0.00031, 
0.00031, 3e-04, 0.00021, 2e-04, 4e-05, 2e-05, 2e-05, 1e-05, 1e-05, 
0), MEL_005.4_haplotype7 = c(3e-05, 5e-05, 6e-05, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0)), .Names = c("SNP", "MEL_004.3_haplotype1", "MEL_004.3_haplotype2", 
"MEL_004.3_haplotype3", "MEL_004.3_haplotype4", "MEL_004.3_haplotype5", 
"MEL_004.3_haplotype6", "MEL_004.3_haplotype7", "MEL_004.3_haplotype8", 
"MEL_005.3_haplotype1", "MEL_005.3_haplotype2", "MEL_005.3_haplotype3", 
"MEL_005.3_haplotype4", "MEL_005.3_haplotype5", "MEL_005.3_haplotype6", 
"MEL_005.3_haplotype7", "MEL_005.3_haplotype8", "MEL_005.4_haplotype1", 
"MEL_005.4_haplotype2", "MEL_005.4_haplotype3", "MEL_005.4_haplotype4", 
"MEL_005.4_haplotype5", "MEL_005.4_haplotype6", "MEL_005.4_haplotype7"
), row.names = c(NA, 30L), class = "data.frame")

Thank you

  • 1
    Could you post the output of `dput(A3[1:30, 1:24])`? That would give us a reproducible example of (a subset of) your data to use to show you the solution. – David Robinson Sep 04 '15 at 14:46
  • In addition to coment above: code in question is unusable, where is A4 coming from ? what is it ? as A5 is taken from it a,d all turn around it later, it's a showstopper just on the third line – Tensibai Sep 04 '15 at 14:52
  • R doesn't like wide data, but long data is better for analysis in R. I'd look into [`tidyr`'s](https://cran.r-project.org/web/packages/tidyr/index.html) [`gather`](http://blog.rstudio.org/2014/07/22/introducing-tidyr/) – Paul James Sep 04 '15 at 14:55
  • @Tensibai sorry for the error, A3 should be A4 in the beginning. Just fixed it – Ioannis Baltzakis Sep 04 '15 at 15:15
  • 1
    Even with your edit you haven't shared a reproducible example (with `dput(A3[1:30, 1:24])`). Could you please share one? (Otherwise anyone trying to solve your problem is going to have to *guess* whether their solution would work) – David Robinson Sep 04 '15 at 15:16
  • @DavidRobinson should I make it a file link or just copy and paste the code here ? Thank you for your time! – Ioannis Baltzakis Sep 04 '15 at 15:17
  • Just copy and paste the result from dput into your question, yep – David Robinson Sep 04 '15 at 15:18

1 Answers1

3

You can do this efficiently with the dplyr package, after first gathering all probabilities into a single column (very similar to what you were doing with melt):

library(dplyr)
library(tidyr)

pairs <- A4 %>%
  gather(individual, value, -SNP) %>%
  separate(individual, c("group", "founder"), sep = "_haplotype") %>%
  group_by(SNP, group) %>%
  filter(value == max(value)) %>%
  filter(n() == 2) %>%
  ungroup() %>%
  arrange(SNP)

(By the way, the ungroup and arrange aren't strictly necessary here: it is just nice to have it sorted by SNP when you view the output).

The subset of data you posted had no 2-way ties for first place. But we could add 5 ties artificially with:

A4[1:5, 2:3] <- .999

At this point, the output would look like:

Source: local data frame [10 x 4]

                      SNP     group founder value
                   (fctr)     (chr)   (chr) (dbl)
1     BobWhite_c18256_105 MEL_004.3       1 0.999
2     BobWhite_c18256_105 MEL_004.3       2 0.999
3           BS00081981_51 MEL_004.3       1 0.999
4           BS00081981_51 MEL_004.3       2 0.999
5    Excalibur_c42951_136 MEL_004.3       1 0.999
6    Excalibur_c42951_136 MEL_004.3       2 0.999
7       RAC875_c46403_277 MEL_004.3       1 0.999
8       RAC875_c46403_277 MEL_004.3       2 0.999
9  Tdurum_contig83209_316 MEL_004.3       1 0.999
10 Tdurum_contig83209_316 MEL_004.3       2 0.999

This shows pairs of probability that were tied within a SNP and group. Incidentally, you might prefer to reorganize this into one-row-per-tie, which could be done with tidyr's spread:

pair_ties <- pairs %>%
  group_by(SNP, group) %>%
  mutate(column = c("founder1", "founder2")) %>%
  spread(column, founder)

Which would have output:

Source: local data frame [5 x 5]

                     SNP     group value founder1 founder2
                  (fctr)     (chr) (dbl)    (chr)    (chr)
1    BobWhite_c18256_105 MEL_004.3 0.999        1        2
2          BS00081981_51 MEL_004.3 0.999        1        2
3   Excalibur_c42951_136 MEL_004.3 0.999        1        2
4      RAC875_c46403_277 MEL_004.3 0.999        1        2
5 Tdurum_contig83209_316 MEL_004.3 0.999        1        2

As a quick explanation of how this works- it relies on the dplyr package, which defines a grammar for data manipulation in terms of sequential steps, piped together with the %>% "pipe" operator. One of the best intros is here. Those are worth reading in general for what each function does. In this case, imagine it happening in four stages:

x1 <- A4 %>%
  gather(individual, value, -SNP) %>%
  separate(individual, c("group", "founder"), sep = "_haplotype")

These steps do the same thing as melt in your code: they gather all the probabilities into a single column called value, and move the column names into a column called individual alongside it. It then separates the individual into a "group" column (your 600+ groups) and a "founder" column (1-8).

x2 <- x1 %>%
  group_by(SNP, group) %>%
  filter(value == max(value))

This says you want to look within each group made up of a unique SNP and group. This is taking the place of your nested for loops above. Within each group, it filters for rows where the value column is as large as the maximum.

x3 <- x2 %>%
  filter(n() == 2)

The data frame is still grouped, and only has ties for first-place, and this is saying you want to include only groups of size 2. At this point, the work is done, and the last two steps just sort it by SNP to make it more readable:

result <- x3 %>%
  ungroup() %>%
  arrange(SNP)

Incidentally, to estimate how long this would take on your data, I simulated some data of the same size as yours:

groups <- 600
founders <- 8
SNPs <- 896

vals <- as.data.frame(replicate(groups * founders, sample(10, SNPs, replace = TRUE) / 10))
names(vals) <- paste0("P", rep(seq_len(groups), each = founders), "_", seq_len(founders))

SNPnames <- paste("SNP", seq_len(SNPs))
A4 <- cbind(SNP = SNPnames, vals)

On this A4, the code took about 30 seconds to run on my machine.

David Robinson
  • 77,383
  • 16
  • 167
  • 187
  • With the simulated data your solution runs fine. When I am using my data set I get : Error: C stack usage 40425427 is too close to the limit – Ioannis Baltzakis Sep 04 '15 at 15:54
  • @user919367 That can happen with `gather` on large datasets. Please try a) Restarting R fresh and re-running, b) Restarting R with [this approach to increase your max stack size](http://stackoverflow.com/questions/14719349/error-c-stack-usage-is-too-close-to-the-limit), c) If that doesn't work, Try it on half your data. If it works, just do it in two steps. – David Robinson Sep 04 '15 at 15:56
  • Thank you. You are truly awesome. If I may ask, could I get a small explanation on your code if you have time and feel like it ? – Ioannis Baltzakis Sep 04 '15 at 16:02
  • @user919367 I've edited in an explanation. (By the way, if the answer solved your problem you can "accept" it by clicking on the green checkmark next to it) – David Robinson Sep 04 '15 at 16:44