1

I want to add probabilities to each item in a list, where that list is in another list.

Some psuedo-code:

myList = [ [a, b, c, d], [e, f, g, h], [i, j, k, l], [m, n, o], [p, q, r], [s, t, u] ]

probabilities = [ [0.6, 0.3, 0.075, 0.025], [0.6, 0.3, 0.075, 0.025], [0.6, 0.3, 0.075, 0.025], [0.55, 0.35, 0.1], [0.55, 0.35, 0.1], [0.55, 0.35, 0.1] ]

Is there any way to do achieve this?

Further:

My need for this is to create another list that would look similar to the below...

newList = [ [b, e, k, o, p, s], [a, f, i, m, r, t], ... etc. ] 

where each element was chosen randomly given the probabilities, and no two list in newList are the same. Which I am not sure is achievable.

My code so far:

layers = [list(Path(directory).glob("*.png")) for directory in ("dir1/", "dir2/", "dir3/", "dir4/", "dir5/", "dir6/")]

list_of_prob = [[0.6, 0.3, 0.075, 0.025], [0.6, 0.3, 0.075, 0.025], [0.6, 0.3, 0.075, 0.025], [0.6, 0.3, 0.1], [0.6, 0.3, 0.1], [0.6, 0.3, 0.1]]

rwp = [choices(layers, list_of_prob, k=????)]

rand_combinations = [([choice(k) for k in layers]) for i in choice_indices]

I am not entirely sure what k would be in choices(), ex. number of lists or number of total elements in the lists. Layers is a list of image paths, .pngs, which is identical to the format of "myList" provided above in pseudo code (4 images in dir1, 4 images in dir2, 4 images in dir3, 3 in dir4, 3 in dir5, 3 in dir6).

I already have code to iterate through a list and create random images, but I want some of the images to only be generated x% of the time. Hence my original question. Sorry if I just complicated things, I tried to simplify it.

  • 1
    Rest assured what you want to do is definitely possible. Can you post the code you have written so far to try and do this? – C_Z_ Aug 18 '21 at 18:45
  • Are the elements of `myList` supposed to be strings? You need to quote them. – Barmar Aug 18 '21 at 18:48
  • Use `random.choices()` with its `weights` argument. – Barmar Aug 18 '21 at 18:49
  • I understand your question technically and it's not hard: first, re-order each list using the probabilities then [merge them together using `itertools.zip_longest`](https://stackoverflow.com/questions/1277278/is-there-a-zip-like-function-that-pads-to-longest-length) and filter out the `None`s. But mathematically speaking, how do you order something without replacement by probabilities? Do you just remove the probability of the element chose, sum up the rest and divide all the rest by their sum? Do you just keep picking elements until you've picked each at least once? is that the same? – Boris Verkhovskiy Aug 18 '21 at 18:54
  • @Barmar no, they are actually image paths. I was using "a, b, c..." as pseudo code. I have tried random.choices() but it does not seem to like that syntax – Dominick Fiducia Aug 18 '21 at 18:54
  • @C_Z_ I will edit with my code – Dominick Fiducia Aug 18 '21 at 18:55
  • You must not be using it correctly. Show how you tried to use it. – Barmar Aug 18 '21 at 18:55
  • @Barmar `choices` does sampling [with replacement](https://docs.python.org/3/library/random.html#random.choices), I don't think @Dominick wants that. – Boris Verkhovskiy Aug 18 '21 at 18:56
  • @Boris He's only getting one choice each iteration, so I don't think replacement is relevant. – Barmar Aug 18 '21 at 18:58
  • @Boris good question, I can easily make all possible combinations of the list elements. So I think the question to be asked is how to sort through all possible combinations in order to make certain elements be in x% of the lists. Or, for example creating N lists with those elements/probabilities and avoiding any duplicate lists – Dominick Fiducia Aug 18 '21 at 18:59
  • @DominickFiducia if `d` appears in the first list, can it appear in the second list? In your example that doesn't happen. Can `new_list` look like `[[d, ...], [d, ...], [d, ...]]` or will the item never be repeated, e.g. `[[b, ...], [a, ...], [d, ...], [c, ...]]` – Boris Verkhovskiy Aug 18 '21 at 20:24
  • @Boris yes it can appear in any other list, so long as the combination of items in the list is not repeated again. – Dominick Fiducia Aug 18 '21 at 20:31

2 Answers2

1

I converted myList to strings just to make things easy.

This will create combinations and append them to newList, disregarding any combinations that already exist in newList

While loop ends when length of newList is equal to the length of myList

import random

myList = [['a', 'b', 'c', 'd'], 
          ['e', 'f', 'g', 'h'], 
          ['i', 'j', 'k', 'l'], 
          ['m', 'n', 'o'], 
          ['p', 'q', 'r'], 
          ['s', 't', 'u']]

probabilities = [[0.6, 0.3, 0.075, 0.025], 
                 [0.6, 0.3, 0.075, 0.025], 
                 [0.6, 0.3, 0.075, 0.025], 
                 [0.55, 0.35, 0.1], 
                 [0.55, 0.35, 0.1], 
                 [0.55, 0.35, 0.1]]
newList = []

def random_list():
    combo = []
    for d, p in zip(myList, probabilities):
        choice = random.choices(d, p)
        combo.append(''.join(choice))
    return combo

while len(newList) < len(myList):
    a = random_list()
    if a not in newList:
        newList.append(a)

Results of newList:

[['b', 'f', 'k', 'm', 'q', 's'],
 ['a', 'e', 'j', 'm', 'q', 't'],
 ['b', 'f', 'k', 'm', 'q', 'u'],
 ['a', 'f', 'i', 'm', 'p', 's'],
 ['a', 'e', 'i', 'n', 'p', 't'],
 ['b', 'f', 'k', 'm', 'r', 'u']]
fthomson
  • 773
  • 3
  • 9
0

So if I understood correctly, you want to take one element of each sublist of myList based on the probabilities to get an element, and do this multiple times to get a list of lists.

With Python 3, you can do:

import random
# the new set since two elements cannot be the same
result_set = set()
# the number of times you want to do it, you can change it.
for i in range(5):
    # A single result as a str, hashable.
    result = ""
    for chars, char_probabilities in zip(myList, probabilities):
    # weighted random choice
    char = random.choices(chars, char_probabilities)
    result += char[0]
    result_set.add(result)

result_list = [list(elt) for elt in result_set]
  • This definitely seems like a step in the right direction, but how can I force 1 element from each list to end up in result_list? After running this, I see some elements in result_list have less than 6 elements in the corresponding list when I would always want 6 elements in each list that gets added to result_list. Thank you! – Dominick Fiducia Aug 18 '21 at 19:29
  • mmm, strange, this is a sample result from running that code: [['a', 'e', 'k', 'n', 'p', 's'], ['a', 'g', 'l', 'm', 'p', 't'], ['a', 'e', 'i', 'n', 'p', 's'], ['a', 'e', 'i', 'n', 'q', 'u'], ['c', 'f', 'i', 'n', 'p', 't']] – Kevin Eaverquepedo Aug 18 '21 at 19:34
  • ahh, okay! Definitely correct then, I am not using it on a list of char which might be the issue. Apologies, I didn't want to overcomplicate things in my original post... I am actually using it for a list of image paths, so objects. I'm trying to figure out currently how to use your code for a list of image paths – Dominick Fiducia Aug 18 '21 at 19:38
  • I'd have to see a minimal reproductible example to see how to adapt the solution. – Kevin Eaverquepedo Aug 18 '21 at 19:39
  • So this is one of the lists added to result_list. I chose this because it does indeed have 6 elements. Some have fewer. But these are the objects I am passing. [[WindowsPath('dir1/img1.png')], [WindowsPath('dir2/img3.png')], [WindowsPath('dir3/img2.png')], [WindowsPath('dir4/img1.png')], [WindowsPath('dir5/img1.png')], [WindowsPath('dir6/img2.png')]] – Dominick Fiducia Aug 18 '21 at 19:45
  • saw your question update, you could simply do something like: `layers = [list(str(Path(directory).glob("*.png"))) for directory in ("dir1/", "dir2/", "dir3/", "dir4/", "dir5/", "dir6/")]` to have them as strs, and you can convert them to Paths afterwards. Not the cleanest, but I'd have to completely change my approach to fit your extended question. – Kevin Eaverquepedo Aug 18 '21 at 19:49
  • I will be trying to solve this, @fthomson 's answer works quite well also. – Dominick Fiducia Aug 18 '21 at 20:03