Loop breaking off when checking for duplicates in a list?

Question

I'm trying to generate a list using random results from another list, and I want there to be no duplicates (with one exception) when I do so. The problem occurs when I check for duplicates - it automatically breaks the loop when it finds one and I don't know why.

As far as I can tell, everything seems to be correct. I've run the code through pythontutor.com/visualise, I've tried different pieces of code to check for duplicates, I've changed the loops into for loops, while loops, range loops. I tested out my definition for turning_point, I even copy-pasted it into the loop itself instead of using it as a function, I tried changing the position of the 'if' statement, etc. I've been at this for an entire night and I still can't troubleshoot the problem.

Edit: I don't actually want such a heavy weight on a particular instance (in this case, "conclusion"), I just did that to test out the duplicates check. In reality the weights are closer to 3, 1, 1, 2, 1, etc. The other thing is that in my actual code, the action_tables are 43 values long instead of just 7.


#first list to draw random items from
action_table = ["conclusion", "none", "confrontation", "protector", "crescendo", "destroy the thing", "meta"]

#code to draw random items from first list
#first variable weighted to guarantee duplicates for testing
def turning_point():
    turning_point = (random.choices(action_table, [500, 1, 1, 1, 1, 1, 1]))
    return turning_point

#generate second list called 'plot_points'
plot_points = []
for x in range(int(input("How many Turning Points to generate? "))):
    tp = turning_point()
#the code below is what I got from this site
#added tp != to allow duplicates of "none" result
    if any(plot_points.count(tp) > 1 for tp in plot_points) and tp != "none": 
        continue
#results get added to the plot_points list:
    plot_points.append(tp)
print(plot_points)

If I remove the line that checks for duplicates, this is what I get:

[['conclusion'], ['conclusion'], ['meta'], ['conclusion'], ['conclusion']]

If I don't remove the line, this is what I get:

[['conclusion'], ['conclusion']]

What I'd like to get is something like:

[['conclusion'], ['none'], ['none'], ['meta'], ['protector']]

Are you sure you want to produce a list with nested 1-element lists? Why not try to generate `['conclusion', 'none', 'none', 'meta', 'protector']`? — Martijn Pieters, Jul 07 '19 at 17:21
I didn't realise random.choices() returns a list - I was attempting to return a single value! I also didn't notice the nested 1-element lists - I thought they were individual values (again, I'm fairly new to Python). That clears it up - how would I change the code for turning_point() to only generate values and not an entire (single element) list? Edit: The reason I did it this way was because I need the results to be weighted, I couldn't find any other code to bias the results without doing it this way. — Raw Grave, Jul 07 '19 at 17:27
You can't easily mix *weighted* and *must be unique*. You ask for a list of 5 elements out of 7 choices, but nothing but `"none"` can be repeated? Then just add an extra `"none"` to your input list, shuffle that list list and pick the first 5 elements. — Martijn Pieters, Jul 07 '19 at 17:43

Martijn Pieters · Accepted Answer · 2019-07-08T16:12:31.293

The error is here:

tp != "none"

tp is always a list with one element, because random.choices() returns a list with a single element by default. From the documentation for random.choices():

random.choices(population, weights=None, *, cum_weights=None, k=1) Return a k sized list of elements chosen from the population with replacement.

With k left at 1, tp is going to be a 1-element list each time, and can never be equal to "none". It will be equal to ["none"], or ["conclusion"], and so forth, instead. That means that `tp != "none" is always true.

Next, your any() test only kicks in if there is more than one nested list with the currently selected value, so at least 2. At that point, you start skipping anything that appeared twice, because the tp != "none" is always true:

>>> plot_points = [["conclusion", "conclusion"]]
>>> tp = ["conclusion"]
>>> any(plot_points.count(tp) > 1 for tp in plot_points)
True
>>> tp != "none"
True

Your weightings for the choices given make it very, very unlikely that anything other than "conclusion" is picked. For your 7 options, ["conclusion"] will be picked 500 out of 506 times you call your turning_point() function, so the above scenario will occur most of the time (976562500000 out of every 1036579476493 experiments will turn up ["conclusion"] 5 times in a row, or about 33 out of every 35 tests). So you'll extremely rarely will see any of the other options be produced twice, let alone 3 times (only 3 out of every 64777108 tests will repeat any of the other options three times or more).

If you must produce a list in which nothing repeats except for none, then there is no point in weighting choices. The moment "conclusion" has been picked, you can't pick it again anyway. If the goal is to make it highly likely that a "conclusion" element is part of the result, then just make that a separate swap at the end, and just shuffle a list of the remaining choices first. Shuffling lets you cut the result down to size and the first N elements will all be random, and unique:

>>> import random
>>> action_table = ["confrontation", "protector", "crescendo", "destroy the thing", "meta"]
>>> random.shuffle(action_table)  # changes the list order in-place
>>> action_table[:3]
['meta', 'crescendo', 'destroy the thing']

You could pad out that list with "none" elements to make it long enough to meet the length requirements, and then insert a conclusion in a random position based on the chances that one should be included:

def plot_points(number):
    action_table = ["none", "confrontation", "protector", "crescendo", "destroy the thing", "meta"]
    if number > 6:
        # add enough `"none"` elements
        action_table += ["none"] * (number - 6)
    random.shuffle(action_table)
    action_table = action_table[:number]
    if random.random() > 0.8:
        # add in a random "conclusion"
        action_table[random.randrange(len(number))] = "conclusion"
    return action_table

Note that this is a pseudo-weighted selection; conclusion is selected 80% of the time, and uniqueness is preserved with only "none" repeated to pad out the results. You can’t have uniqueness for the other elements otherwise.

However, if you must have

unique values in the output list (and possibly repeat "none")
weighted selection of inputs

Then you want a weighted random sample selection without replacement. You can implement this using standard Python libraries:

import heapq
import math
import random

def weighted_random_sample(population, weights, k):
    """Chooses k unique random elements from a population sequence.

    The probability of items being selected is based on their weight.

    Implementation of the algorithm by Pavlos Efraimidis and Paul
    Spirakis, "Weighted random sampling with a reservoir" in 
    Information Processing Letters 2006. Each element i is selected
    by assigning ids with the formula R^(1/w_i), with w_i the weight
    for that item, and the top k ids are selected. 

    """ 
    if not 0 <= k < len(population):
        raise ValueError("Sample larger than population or negative")
    if len(weights) != len(population):
        raise ValueError("The number of weights does not match the population")

    key = lambda iw: math.pow(random.random(), 1 / iw[1])
    decorated = heapq.nlargest(k, zip(population, weights), key=key)
    return [item for item, _ in decorated]

Use this to select your items if you need 7 items or fewer, otherwise and extra "none" values and just shuffle (as all 7 items end up selected anyway):

def plot_points(number):
    action_table = ["conclusion", "none", "confrontation", "protector", "crescendo", "destroy the thing", "meta"]

    if number > len(action_table):
        # more items than are available
        # pad out with `"none"` elements and shuffle
        action_table += ["none"] * (number - len(action_table))
        random.shuffle(action_table)
        return action_table

    weights = [3, 1, 1, 1, 2, 2, 1]
    return weighted_random_sample(action_table, weights, number)

Demo:

>>> plot_points(5)
['none', 'conclusion', 'meta', 'crescendo', 'destroy the thing']
>>> plot_points(5)
['conclusion', 'destroy the thing', 'crescendo', 'meta', 'confrontation']
>>> plot_points(10)
['none', 'crescendo', 'protector', 'confrontation', 'meta', 'destroy the thing', 'none', 'conclusion', 'none', 'none']

Of course, if your real action_table is much larger and you disallow picking more plot points than you have actions, there is no need to pad things out at all and you'd just use weighted_random_sample() directly.

Apologies, I should have clarified, "none" is an actual string value in and of itself and not a 'none' value that you would get when you return no value at all. As for the weightings, I exaggerated them for testing, but in reality what I'm going for is something closer to 3, 1, 1, 1, 2 etc. - the other thing is that the original 'action_table' is much longer, in fact there are 43 values. Thanks for clarifying with the any() test. — Raw Grave, Jul 07 '19 at 17:58
Would never have thought to shuffle the list and draw from there - would have liked to have weighted results but I'll take this - thanks! — Raw Grave, Jul 07 '19 at 18:14
@RawGrave you can repeat values in the input list( repeated values can appear that many times more. — Martijn Pieters, Jul 07 '19 at 18:20
@RawGrave and you want to update your question to be clearer on what the expected outcome is vis-a-vis the weightings. — Martijn Pieters, Jul 07 '19 at 18:21
I got confused when you said "Shuffling lets you cut the result down to size and the first N elements will all be random, and unique" - what part of the code allows it to be unique? Will repeat values in the input list, that works very well for me since the weights are not very big (5 is the maximum). — Raw Grave, Jul 07 '19 at 18:27
@RawGrave all shuffle does is rearrange the order of the input list. If the input list only contains unique elements, the output will be unique. If you repeat elements in the input list, chances are you’ll get duped in the output list. — Martijn Pieters, Jul 07 '19 at 23:58
@RawGrave This was bothering me enough to add a proper weighted sample without replacement implementation. — Martijn Pieters, Jul 08 '19 at 01:06

Monaheng Ramokhoro · Answer 2 · 2019-07-07T18:02:58.433

Hello I would suggest you make the following changes to your code:

I tried my best to make as few changes as possible.

def turning_point():
  turning_point = (random.choices(action_table))
  return turning_point

Your weights were too disproportionate, you would get 'conclusion' way too many times.

for x in range(int(input("How many Turning Points to generate? "))):
    tp = turning_point()
    print(tp)
#the code below is what I got from this site
#added tp != to allow duplicates of "none" result
    plot_points.append(tp)
    if any(plot_points.count(tp) > 1 for tp in plot_points) and tp != ["none"]: 
        plot_points.pop()
#results get added to the plot_points list:
    # else: 
        # plot_points.append(tp)
    print(plot_points)
print(plot_points)

You will notice two main differences:

I changed the point at which you appended to your list and I replaced the continue statement with a pop() function. The problem is, you were checking the 'counts' too late(as an example, your loop would insert 'conclusion' inoto plot_points, then end and tp would change value, say it changes to 'protector', by the time you check the any..). So appending and then checking the count, then popping if you have too many made more sense to me.
I changed your tp != "none" to tp != ["none"], because remember you're working with single member lists.
I added some print statements purely so you can visually see what is happening at each iteration

Loop breaking off when checking for duplicates in a list?

2 Answers2

Hello I would suggest you make the following changes to your code:

Your weights were too disproportionate, you would get 'conclusion' way too many times.

You will notice two main differences: