0

my problem is the following:
I got a data.frame in R which contains coordinates, e.g.

SNP1  chr1  123456  
SNP2  chr1  156895  
SNP3  chr1  550000  
...

Now I want to specify a region (e.g. chr1:100000-500000) and a number of SNPs (n) and find those n SNPs in that region that are most evenly distributed in that region.

I have a script that can divide the region into n-1 pieces and select the SNPs that are closest to the borders of the pieces. It can also exclude SNPs that are named twice and get the next closest SNP but still there might be a better solution for selecting them evenly distributed (maybe by somehow maximizing the total distance between them but the total number of SNPs is quite high?).

Justin
  • 42,475
  • 9
  • 93
  • 111
UUU
  • 29
  • 2
  • 3
    Can you provide some [reproducible](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) data and code? maybe show us what your script does? the `"data.frame"` you've shown doesn't make this question easy to understand or answer. – Justin Aug 23 '12 at 14:58
  • 1
    In order to determine what the best definition of "evenly distributed" is, we'd need to know why you're looking for evenly distributed SNPs in the first place (I assume it has something to do with linkage). – David Robinson Aug 23 '12 at 15:50
  • @Seth: That wouldn't work if the data were anything but uniformly distributed, and could come very far from working if the density is very inconsistent. – David Robinson Aug 23 '12 at 19:01

0 Answers0