my problem is the following:
I got a data.frame
in R which contains coordinates, e.g.
SNP1 chr1 123456
SNP2 chr1 156895
SNP3 chr1 550000
...
Now I want to specify a region (e.g. chr1:100000-500000) and a number of SNP
s (n
) and find those n
SNP
s in that region that are most evenly distributed in that region.
I have a script that can divide the region into n-1
pieces and select the SNPs that are closest to the borders of the pieces. It can also exclude SNP
s that are named twice and get the next closest SNP
but still there might be a better solution for selecting them evenly distributed (maybe by somehow maximizing the total distance between them but the total number of SNP
s is quite high?).