6

I have a list similar to:

[1 2 1 4 5 2 3 2 4 5 3 1 4 2] 

I want to create a list of x random elements from this list where none of the chosen elements are the same. The difficult part is that I would like to do this by using list comprehension... So possible results if x = 3 would be:

[1 2 3]
[2 4 5]
[3 1 4]
[4 5 1]

etc...

Thanks!

I should have specified that I cannot convert the list to a set. Sorry! I need the randomly selected numbers to be weighted. So if 1 appears 4 times in the list and 3 appears 2 times in the list, then 1 is twice as likely to be selected...

braden.groom
  • 305
  • 2
  • 5
  • 16
  • 6
    Have you worked out a way to do it _without_ a list comprehension? – John La Rooy Oct 16 '13 at 03:41
  • Have you considered using a set? – Asad Saeeduddin Oct 16 '13 at 03:41
  • 4
    You need to clarify the question: is an outcome like `[1, 2, 1]` OK -- in other words, a sub-list having two of the same values (1 in this case). – FMc Oct 16 '13 at 04:11
  • ...where none of the chosen elements are the same ... – dansalmo Oct 16 '13 at 04:21
  • @FMc what else would that particular clause mean? – Free Monica Cellio Oct 16 '13 at 04:36
  • @dansalmo, using mgilsons marble analogy. If you have marbles with the numbers drawn on them. You can only select each marble once from the original list. It's not clear if you have have two marbles with the same number in the resulting lists. – John La Rooy Oct 16 '13 at 04:38
  • So you mean that `[1 2 3]` is a valid result but `[1 1 2]` is not because the `1` is the same? – Claudiu Oct 16 '13 at 04:59
  • There are already answers, but see also these questions: [selection based on percentage weighting](http://stackoverflow.com/questions/3655430/selection-based-on-percentage-weighting/3655542#3655542), [Weighted random selection with and without replacement](http://stackoverflow.com/questions/352670/weighted-random-selection-with-and-without-replacement) –  Oct 16 '13 at 06:24
  • @gnibbler, I see your point regarding that possible interpretation, but given the lack of a clear definition or answer to questions I think he means: Random choice of elements with replacement and no two values repeated in the final list. – dansalmo Oct 16 '13 at 17:51

9 Answers9

11

Disclaimer: the "use a list comprehension" requirement is absurd.

Moreover, if you want to use the weights, there are many excellent approaches listed at Eli Bendersky's page on weighted random sampling.

The following is inefficient, doesn't scale, etc., etc.

That said, it has not one but two (TWO!) list comprehensions, returns a list, never duplicates elements, and respects the weights in a sense:

>>> s = [1, 2, 1, 4, 5, 2, 3, 2, 4, 5, 3, 1, 4, 2]
>>> [x for x in random.choice([p for c in itertools.combinations(s, 3) for p in itertools.permutations(c) if len(set(c)) == 3])]
[3, 1, 2]
>>> [x for x in random.choice([p for c in itertools.combinations(s, 3) for p in itertools.permutations(c) if len(set(c)) == 3])]
[5, 3, 4]
>>> [x for x in random.choice([p for c in itertools.combinations(s, 3) for p in itertools.permutations(c) if len(set(c)) == 3])]
[1, 5, 2]

.. or, as simplified by FMc:

>>> [x for x in random.choice([p for p in itertools.permutations(s, 3) if len(set(p)) == 3])]
[3, 5, 2]

(I'll leave the x for x in there, even though it hurts not to simply write list(random.choice(..)) or just leave it as a tuple..)

DSM
  • 342,061
  • 65
  • 592
  • 494
  • 1
    Clever answer to a crazy question. Maybe you're just trying to juice the list-comprehension stats, but isn't `itertools.permutations(s, 3)` sufficient? No need for both combinations and permutations. – FMc Oct 16 '13 at 04:24
  • @FMc: no, I just realized at the last second that `combinations` wouldn't suffice and added `permutations` in there without thinking about whether I needed them both. Good call :^) – DSM Oct 16 '13 at 04:30
  • This is a great answer, but I'm still wondering if it answers OP's question. What if instead of having a list of integers, we had a list of custom objects with equality based on object ID. You can't guarantee that you won't pull the same object out two times in a row. The way I envision this is as a bag of marbles -- each with a value. You randomly pull 3 marbles out at a time until there are none left -- And there may or may not be a constraint that some marbles can't be extracted together ... What you're doing is putting the 3 marbles you just pulled back into the bag and pulling again. – mgilson Oct 16 '13 at 04:35
  • @mgilson: I didn't think the OP's question was ambiguous when I first read it but after all the points you and others have raised I no longer have any idea what's going on. :^) You're right that I'm treating it as a coins-in-a-jar, draw-until-you-get-three-different problem. – DSM Oct 16 '13 at 04:38
  • I don't think any of us know what's going on. We're all just guessing as best as we can... – mgilson Oct 16 '13 at 04:43
  • @DSM: Nice answer! @mgilson: This depends on what `__hash__` and `__eq__` do for the objects you're using. If objects are equivalent based on object ID then define those methods & this approach will work – Claudiu Oct 16 '13 at 05:23
  • @Claudiu -- I know how `__hash__` and `__eq__` work. I don't have a problem with the `set` use here. The question is whether it is OK to put the objects back in the "bag" (or coin jar as DSM put it) after you've drawn them out before the next draw. – mgilson Oct 16 '13 at 06:51
  • @mgilson: oh, this code doesn't do that. it just picks one of the possible valid permutations, and each permutation is done without putting things back in the "bag" - at least that's how I'm reading the very last snippet – Claudiu Oct 16 '13 at 15:39
  • @Claudiu -- Each time you call this one, the elements available in the bag are the same (whatever was in `s`). Each draw is independent of the previous draws, so it's as if you take the objects out of the bag and then put them back in -- shake it around and draw again. To remove the items from the "bag", he would need to remove them from `s` before the next draw. – mgilson Oct 16 '13 at 15:52
  • @mgilson: oh I gotcha. i didn't think that's what the OP is asking, which is why it took me a while to get what you were saying. – Claudiu Oct 16 '13 at 16:28
6

Generally, you don't want to do this sort of thing in a list comprehension -- It'll lead to much harder to read code. However, if you really must, we can write a completely horrible 1 liner:

>>> values = [random.randint(0,10) for _ in xrange(12)]
>>> values
[1, 10, 6, 6, 3, 9, 0, 1, 8, 9, 1, 2]
>>> # This is the 1 liner -- The other line was just getting us a list to work with.
>>> [(lambda x=random.sample(values,3):any(values.remove(z) for z in x) or x)() for _ in xrange(4)]
[[6, 1, 8], [1, 6, 10], [1, 0, 2], [9, 3, 9]]

Please never use this code -- I only post it for fun/academic reasons.

Here's how it works:

I create a function inside the list comprehension with a default argument of 3 randomly selected elements from the input list. Inside the function, I remove the elements from values so that they aren't available to be picked again. since list.remove returns None, I can use any(lst.remove(x) for x in ...) to remove the values and return False. Since any returns False, we hit the or clause which just returns x (the default value with 3 randomly selected items) when the function is called. All that is left then is to call the function and let the magic happen.

The one catch here is that you need to make sure that the number of groups you request (here I chose 4) multiplied by the number of items per group (here I chose 3) is less than or equal to the number of values in your input list. It may seem obvious, but it's probably worth mentioning anyway...

Here's another version where I pull shuffle into the list comprehension:

>>> lst = [random.randint(0,10) for _ in xrange(12)]
>>> lst
[3, 5, 10, 9, 10, 1, 6, 10, 4, 3, 6, 5]
>>> [lst[i*3:i*3+3] for i in xrange(shuffle(lst) or 4)]
[[6, 10, 6], [3, 4, 10], [1, 3, 5], [9, 10, 5]]

This is significantly better than my first attempt, however, most people would still need to stop, scratch their head a bit before they figured out what this code was doing. I still assert that it would be much better to do this in multiple lines.

mgilson
  • 300,191
  • 65
  • 633
  • 696
  • Could you please also post the readable version that *should* be used? – Asad Saeeduddin Oct 16 '13 at 03:57
  • @Asad -- There is no readable version (to my knowledge) which consists solely of a list comprehension like OP is asking for. Other answers do a great job giving readible versions that I would consider using for this problem if the "only list comprehension" clause was removed -- So I don't really feel the need to duplicate that effort.. – mgilson Oct 16 '13 at 04:02
  • @mgilson, it's easy enough to incorporate a shuffle into the list comprehension if you really wanted to. – John La Rooy Oct 16 '13 at 04:05
  • When I run this code, I get outcomes like `[9, 9, 2]` -- which contains two 9s. Maybe I'm confused, but I thought the OP didn't want that. – FMc Oct 16 '13 at 04:05
  • I meant without using a list comprehension, i.e. a plain old function. I didn't notice any other answers that satisfied the criteria in the question: glasslion and shashank's answers are incorrect for the OP's "weight by frequency" edit. As far as I can tell, gnibbler and Tim Peter's answer suffer from the same problem, which I've mentioned in a comment on his answer. – Asad Saeeduddin Oct 16 '13 at 04:08
  • @FMc -- The 2 nines are OK. The constraint here is that you get the same distribution of numbers out that you put in. So if the list you put in has 2 nines in it, then you want to pull exactly 2 nines out of it (at random times and in random groups). – mgilson Oct 16 '13 at 04:08
  • @mgilson You seem to have missed the first constraint (see comment on gnibbler's answer). – Asad Saeeduddin Oct 16 '13 at 04:09
  • 1
    If Asad's interpretation is correct - which is also how I first read it, then the requirement for a list comprehension is nothing short of absurd. That is what I was thinking in the first comment I made on the question. – John La Rooy Oct 16 '13 at 04:15
  • @gnibbler -- So you can pull the shuffle into the list comp. That's a bit tricky too. And ultimately ugly. I hope I've made that point well in my post. – mgilson Oct 16 '13 at 04:20
  • 1
    `shuffle` returns `None` so `xrange(shuffle(...) or 4)` would work for example – John La Rooy Oct 16 '13 at 04:22
  • Hmmm ... that's way nicer. I never thought to include it in that portion of the list-comp. I'm relying on the `shuffle` returning `None` bit too of course, just for some reason my mind got stuck on putting it in the `if` portion. – mgilson Oct 16 '13 at 04:24
2

If I'm understanding your question properly, this should work:

def weighted_sample(L, x):
    # might consider raising some kind of exception of len(set(L)) < x

    while True:
        ans = random.sample(L, x)
        if len(set(ans)) == x:
            return ans

Then if you want many such samples you can just do something like:

[weighted_sample(L, x) for _ in range(num_samples)]

I have a hard time conceiving of a comprehension for the sampling logic that isn't just obfuscated. The logic is a bit too complicated. It sounds like something randomly tacked on to a homework assignment to me.

If you don't like infinite looping, I haven't tried it but I think this will work:

def weighted_sample(L, x):

    ans = []        
    c = collections.Counter(L)  

    while len(ans) < x:
        r = random.randint(0, sum(c.values())
        for k in c:
            if r < c[k]:
                ans.append(k)
                del c[k]
                break
            else:
                r -= c[k]
        else:
            # maybe throw an exception since this should never happen on valid input

     return ans
Free Monica Cellio
  • 2,270
  • 1
  • 16
  • 15
  • I am not sure whether this introduces bias, but a more efficient approach than entirely discarding your random pick if there was a duplicate would be to simply randomly pick a value from the values not already in the set. So far this seems to be the only answer here that actually satisfies the OP's requirements though. – Asad Saeeduddin Oct 16 '13 at 04:12
  • Yeah, the problem there is retaining the original weighting. I'm working on a better one. :) – Free Monica Cellio Oct 16 '13 at 04:14
  • +1 I guess o.o I didn't know the frequencies had to be weighted. I deleted my answer. – Shashank Oct 16 '13 at 04:15
  • Yup. I would put an upper bound on the loop at the very least, and throw an exception if you've spent too long trying to get a non duplicate pick. The best way to do this would be to create a probability density function for the values in the array based on their frequency and use that to pick a value, which would make it linear time. I can't find anything built in to do this, however. – Asad Saeeduddin Oct 16 '13 at 04:18
  • There is certainly a danger if infinite looping if there is not valid solution (ie. less than x distinct values in the original list) – John La Rooy Oct 16 '13 at 04:20
  • @gnibbler I put a comment about how to catch that case (in both versions) – Free Monica Cellio Oct 16 '13 at 04:22
0
def sample(self, population, k):
    n = len(population)
    if not 0 <= k <= n:
        raise ValueError("sample larger than population")
    result = [None] * k
    try:
        selected = set()
        selected_add = selected.add
        for i in xrange(k):
            j = int(random.random() * n)
            while j in selected:
                j = int(random.random() * n)
            selected_add(j)
            result[i] = population[j]
    except (TypeError, KeyError):   # handle (at least) sets
        if isinstance(population, list):
            raise
        return self.sample(tuple(population), k)
    return result

Above is a simplied version of the sample function Lib/random.py. I only removed some optimization code for small data sets. The codes tell us straightly how to implement a customized sample function:

  1. get a random number
  2. if the number have appeared before just abandon it and get a new one
  3. repeat the above steps until you get all the sample numbers you want.

Then the real problem turns out to be how to get a random value from a list by weight.This could be by the original random.sample(population, 1) in the Python standard library (a little overkill here, but simple).

Below is an implementation, because duplicates represent weight in your given list, we can use int(random.random() * array_length) to get a random index of your array.

import random
arr = [1, 2, 1, 4, 5, 2, 3, 2, 4, 5, 3, 1, 4, 2]

def sample_by_weight( population, k):
    n = len(population)
    if not 0 <= k <= len(set(population)):
        raise ValueError("sample larger than population")
    result = [None] * k
    try:
        selected = set()
        selected_add = selected.add
        for i in xrange(k):
            j = population[int(random.random() * n)]
            while j in selected:
                j = population[int(random.random() * n)]
            selected_add(j)
            result[i] = j
    except (TypeError, KeyError):   # handle (at least) sets
        if isinstance(population, list):
            raise
        return self.sample(tuple(population), k)
    return result

[sample_by_weight(arr,3) for i in range(10)]
Leonardo.Z
  • 9,425
  • 3
  • 35
  • 38
  • What is the purpose of the comprehension here? `random.sample(arr,3)` already returns a sample of 3 elements from the array. – Asad Saeeduddin Oct 16 '13 at 03:43
  • Doing this doesn't guarantee that my selected elements are going to be different though. This is close to what I need... If random.sample would return 3 different elements then this would be perfect! – braden.groom Oct 16 '13 at 03:44
  • @Asad If did not misunderstand,OP says he/she wants a list and elements inside are lists of 3 integers. – Leonardo.Z Oct 16 '13 at 03:46
  • @JoranBeasley the data in arr is provided by OP and I use range(10) here because OP did not mentioned the size of the list he wants. – Leonardo.Z Oct 16 '13 at 03:50
  • @braden.groom I think my updated answer can keep the sample result unique while selecting them by weight. – Leonardo.Z Oct 16 '13 at 06:18
0

First of all, I hope your list might be like

[1,2, 1, 4, 5, 2, 3, 2, 4, 5, 3, 1, 4, 2]  

so if you want to print the permutation from the given list as size 3, you can do as the following.

import itertools

l = [1,2, 1, 4, 5, 2, 3, 2, 4, 5, 3, 1, 4, 2]

for permutation in itertools.permutations(list(set(l)),3):
    print permutation,  

Output:

(1, 2, 3) (1, 2, 4) (1, 2, 5) (1, 3, 2) (1, 3, 4) (1, 3, 5) (1, 4, 2) (1, 4, 3) (1, 4, 5) (1, 5, 2) (1, 5, 3) (1, 5, 4) (2, 1, 3) (2, 1, 4) (2, 1, 5) (2, 3, 1) (2, 3, 4) (2, 3, 5) (2, 4, 1) (2, 4, 3) (2, 4, 5) (2, 5, 1) (2, 5, 3) (2, 5, 4) (3, 1, 2) (3, 1, 4) (3, 1, 5) (3, 2, 1) (3, 2, 4) (3, 2, 5) (3, 4, 1) (3, 4, 2) (3, 4, 5) (3, 5, 1) (3, 5, 2) (3, 5, 4) (4, 1, 2) (4, 1, 3) (4, 1, 5) (4, 2, 1) (4, 2, 3) (4, 2, 5) (4, 3, 1) (4, 3, 2) (4, 3, 5) (4, 5, 1) (4, 5, 2) (4, 5, 3) (5, 1, 2) (5, 1, 3) (5, 1, 4) (5, 2, 1) (5, 2, 3) (5, 2, 4) (5, 3, 1) (5, 3, 2) (5, 3, 4) (5, 4, 1) (5, 4, 2) (5, 4, 3)   

Hope this helps. :)

Sravan K Ghantasala
  • 1,058
  • 8
  • 14
0
>>> from random import shuffle
>>> L = [1, 2, 1, 4, 5, 2, 3, 2, 4, 5, 3, 1, 4, 2]
>>> x=3
>>> shuffle(L)
>>> zip(*[L[i::x] for i in range(x)])
[(1, 3, 2), (2, 2, 1), (4, 5, 3), (1, 4, 4)]

You could also use a generator expression instead of the list comprehension

>>> zip(*(L[i::x] for i in range(x)))
[(1, 3, 2), (2, 2, 1), (4, 5, 3), (1, 4, 4)]
John La Rooy
  • 295,403
  • 53
  • 369
  • 502
  • Your outputs have duplicate values, eg. `(2, 2, 1)`, `(1, 4, 4)`. – Asad Saeeduddin Oct 16 '13 at 04:03
  • @Asad -- That's OK. The restriction is that if the input list has 20 items, after you've picked all 20 items out of it, you have the same distribution of numbers that you put in -- Only now randomly grouped. – mgilson Oct 16 '13 at 04:06
  • @Asad. The question isn't clear. Most answers seem to be interpreting the requirement as not duplicated from the original list – John La Rooy Oct 16 '13 at 04:06
  • 1
    @mgilson No it isn't. To quote the OP: "I want to create a list of x random elements from this list **where none of the chosen elements are the same**". – Asad Saeeduddin Oct 16 '13 at 04:08
  • @Asad -- You're misunderstanding that. Notice that even OP's result code has multiple 5's and 2's. Granted, they're spread out across different sublists. The point is that each element (think about the element's ID, assuming all are unique, not the elements values) is picked from the list once. – mgilson Oct 16 '13 at 04:15
  • @mgilson The OP's result doesn't have multiple `5`s and `2`s in the *same* sample, however. To be honest, I don't think the OP actually cares about a list of samples anyway. It seems the wording: "So **possible results** if x = 3 would be", just describes possible output values for a single sample. The important part is generating a single sample based on random selection weighted by frequency, while ensuring the values in the sample are unique. The OP's comment on your post also seems to point to this interpretation. – Asad Saeeduddin Oct 16 '13 at 04:21
  • @mgilson Sorry, I misread FMc's comment as being from the OP. Do take a look at the OP's comment on glasslion's (now deleted) answer though. – Asad Saeeduddin Oct 16 '13 at 04:32
  • The OP commented on glasslion's deleted answer. That comment is still ambiguous in my opinion. ie. `sample` can return duplicates from the original list when called repeatedly. and you can read it that it can allow duplicates in the result lists too. – John La Rooy Oct 16 '13 at 04:32
  • @gnibbler -- Yeah, I looked at that one and wondered about it. My point is that at very least there's ambiguity here. – mgilson Oct 16 '13 at 04:37
  • @mgilson I agree that it isn't entirely clear cut, which makes it all the more unfortunate that the OP has disappeared. – Asad Saeeduddin Oct 16 '13 at 04:42
  • @Asad -- yeah, these things tend to happen sometimes -- Oh well. I had fun with my horrific lambda monstrosity. I'd never use it in my code ... But it did make for a nice brain teaser. – mgilson Oct 16 '13 at 04:44
0

Starting with a way to do it without list compehensions:

import random
import itertools


alphabet = [1, 2, 1, 4, 5, 2, 3, 2, 4, 5, 3, 1, 4, 2]


def alphas():
    while True:
        yield random.choice(alphabet)


def filter_unique(iter):
    found = set()
    for a in iter:
        if a not in found:
            found.add(a)
            yield a


def dice(x):
    while True:
        yield itertools.islice(
            filter_unique(alphas()),
            x
        )

for i, output in enumerate(dice(3)):
    print list(output)
    if i > 10:
        break

The part, where list comprehensions have troubles is filter_unique() since list comprehension does not have 'memory' of what it did output. The possible solution would be to generate many outputs while the one of good quality is not found as @DSM suggested.

Community
  • 1
  • 1
peterdemin
  • 516
  • 8
  • 26
0

The slow, naive approach is:

import random
def pick_n_unique(l, n):
    res = set()
    while len(res) < n:
        res.add(random.choice(l))
    return list(res)

This will pick elements and only quit when it has n unique ones:

>>> pick_n_unique([1, 2, 1, 4, 5, 2, 3, 2, 4, 5, 3, 1, 4, 2], 3)
[2, 3, 4]
>>> pick_n_unique([1, 2, 1, 4, 5, 2, 3, 2, 4, 5, 3, 1, 4, 2], 3)
[3, 4, 5]

However it can get slow if, for example, you have a list with thirty 1s and one 2, since once it has a 1 it'll keep spinning until it finally hits a 2. The better is to count the number of occurrences of each unique element, choose a random one weighted by their occurrence count, remove that element from the count list, and repeat until you have the desired number of elements:

def weighted_choice(item__counts):
    total_counts = sum(count for item, count in item__counts.items())
    which_count = random.random() * total_counts
    for item, count in item__counts.items():
        which_count -= count
        if which_count < 0:
            return item
    raise ValueError("Should never get here")

def pick_n_unique(items, n):
    item__counts = collections.Counter(items)
    if len(item__counts) < n:
        raise ValueError(
            "Can't pick %d values with only %d unique values" % (
                n, len(item__counts))

    res = []
    for i in xrange(n):
        choice = weighted_choice(item__counts)
        res.append(choice)
        del item__counts[choice]
    return tuple(res)

Either way, this is a problem not well-suited to list comprehensions.

Claudiu
  • 224,032
  • 165
  • 485
  • 680
0

With the setup:

from random import shuffle
from collections import deque

l = [1, 2, 1, 4, 5, 2, 3, 2, 4, 5, 3, 1, 4, 2] 

This code:

def getSubLists(l,n):
    shuffle(l) #shuffle l so the elements are in 'random' order
    l = deque(l,len(l)) #create a structure with O(1) insert/pop at both ends
    while l: #while there are still elements to choose
        sample = set() #use a set O(1) to check for duplicates
        while len(sample) < n and l: #until the sample is n long or l is exhausted
            top = l.pop() #get the top value in l
            if top in sample: 
                l.appendleft(top) #add it to the back of l for a later sample
            else:
                sample.add(top) #it isn't in sample already so use it
        yield sample #yield the sample

You end up with:

for s in getSubLists(l,3):
    print s
>>> 
set([1, 2, 5])
set([1, 2, 3])
set([2, 4, 5])
set([2, 3, 4])
set([1, 4])
HennyH
  • 7,794
  • 2
  • 29
  • 39