3

Say I have a range(1, n + 1). I want to get m unique pairs.

What I found is, if the number of pairs is close to n(n-1)/2 (maxiumum number of pairs), one can't simply generate random pairs everytime because they will start overriding eachother. I'm looking for a somewhat lazy solution, that will be very efficient (in Python's world).

My attempt so far:

def get_input(n, m):

    res = str(n) + "\n" + str(m) + "\n"
    buffet = range(1, n + 1)
    points = set()

    while len(points) < m:
        x, y = random.sample(buffet, 2)
        points.add((x, y)) if x > y else points.add((y, x)) # meeh

    for (x, y) in points:
        res += "%d %d\n" % (x, y);

    return res
John Coleman
  • 51,337
  • 7
  • 54
  • 119
Afonso Matos
  • 2,406
  • 1
  • 20
  • 30

3 Answers3

3

You can use combinations to generate all pairs and use sample to choose randomly. Admittedly only lazy in the "not much to type" sense, and not in the use a generator not a list sense :-)

from itertools import combinations
from random import sample

n = 100
sample(list(combinations(range(1,n),2)),5)

If you want to improve performance you can make it lazy by studying this Python random sample with a generator / iterable / iterator

the generator you want to sample from is this: combinations(range(1,n)

Christian Sloper
  • 7,440
  • 3
  • 15
  • 28
3

Here is an approach which works by taking a number in the range 0 to n*(n-1)/2 - 1 and decodes it to a unique pair of items in the range 0 to n-1. I used 0-based math for convenience, but you could of course add 1 to all of the returned pairs if you want:

import math
import random

def decode(i):
    k = math.floor((1+math.sqrt(1+8*i))/2)
    return k,i-k*(k-1)//2

def rand_pair(n):
    return decode(random.randrange(n*(n-1)//2))

def rand_pairs(n,m):
    return [decode(i) for i in random.sample(range(n*(n-1)//2),m)]

For example:

>>> >>> rand_pairs(5,8)
[(2, 1), (3, 1), (4, 2), (2, 0), (3, 2), (4, 1), (1, 0), (4, 0)]

The math is hard to easily explain, but the k in the definition of decode is obtained by solving a quadratic equation which gives the number of triangular numbers which are <= i, and where i falls in the sequence of triangular numbers tells you how to decode a unique pair from it. The interesting thing about this decode is that it doesn't use n at all but implements a one-to-one correspondence from the set of natural numbers (starting at 0) to the set of all pairs of natural numbers.

John Coleman
  • 51,337
  • 7
  • 54
  • 119
  • Is there a link to a more comprehensive explanation? Thanks. – Afonso Matos Mar 19 '19 at 18:03
  • This is perfect by the way. – Afonso Matos Mar 19 '19 at 18:12
  • @AfonsoMatos I don't know of a link. I started with the enumeration `(1, 0), (2, 0), (2, 1), (3, 0), (3, 1), (3, 2), (4, 0), (4, 1), (4, 2), (4, 3)` (which is the sort of enumeration you need if it isn't to depend on `n`) noticed that there was 1 which started with 1, 2 which started with 2, etc. To get the pair from `i`, I needed to know what "block" `i` is in. `i = 7`, for example is in the 4-block *because* `1+2+3 <= 7 < 1+2+3+4` which translates to `3(3-1)/2 <= 7 < 4(4-1)/2` If you solve `7 = k(k-1)/2` and turn to an integer, you get the `k`. A little more algebra gets you the other index. – John Coleman Mar 20 '19 at 13:55
1

I don't think any thing on your line can improve. After all, as your m get closer and closer to the limit n(n-1)/2, you have thinner and thinner chance to find the unseen pair.

I would suggest to split into two cases: if m is small, use your random approach. But if m is large enough, try

 pairs = list(itertools.combination(buffet,2))
 ponits = random.sample(pairs, m)

Now you have to determine the threshold of m that determines which code path it should go. You need some math here to find the right trade off.

adrtam
  • 6,991
  • 2
  • 12
  • 27