0

Say I have a list of items:

l1 = ['a','b','c',d','e','f','g']

Now, what I wish to do is to randomly split the contents of this list into n number of lists (say n=3), with well defined sizes (say l2 is of length 3, l3 is also of length 3 and l4 is of length 1) such that none of the elements are repeated. i.e

l2 = ['a','d','e']
l3 = ['b','f',g']
l4 = ['c']

How can such a thing be achieved? Thanks.

ayhan
  • 70,170
  • 20
  • 182
  • 203
Rahul Dev
  • 602
  • 1
  • 6
  • 16

2 Answers2

2

One approach would be to randomly shuffle the list and then split it into the sizes you want:

import random

l1 = ['a', 'b', 'c', 'd','e','f','g']

# put the list into a random order
random.shuffle(l1)

l2 = l1[:3]  # first three elements
l3 = l1[3:6]  # second three elements
l4 = l1[6:]  # final element

print(l2)
print(l3)
print(l4)

# Sample output:
# ['d', 'e', 'a']
# ['g', 'b', 'c']
# ['f']
user94559
  • 59,196
  • 6
  • 103
  • 103
  • If performance is an issue, you can also do `i = np.random.permutation(len(l1))` and then `l2 = l1[i[:3]]` etc. The shuffle-in-place operation is pretty slow. The downside is if you need the sublists to be memory-contiguous (i.e. you have some vectorized operation you're doing on them afterward that will suffer from hunting through memory for the elements of the array) you'll need to rearrange them anyway. – Daniel F Jul 14 '17 at 06:31
  • What do you mean by 'vectorized operation'? – Rahul Dev Jul 14 '17 at 06:33
  • Basically (*very* basically) any operation without a `for` loop (even a "hidden" one). Something that operates on the whole list/array/etc all at once. Those types of operations can work better on data that is in one block in memory without hunting all over your RAM, so doing the (slow) in-place `shuffle` at the beginning is worthwhile. If your follow-on functions are consuming the lists one element at a time anyway, it's not worth rearranging them to be contiguous in RAM. Usually I'd expect an array of strings to be consumed sequentially. – Daniel F Jul 14 '17 at 07:03
  • Unless, of course, they're getting fed into some machine-learning algorithm as training/validation/test data. Then being contiguous would be a big help. – Daniel F Jul 14 '17 at 07:07
0

This works for any number n <= length of list

import random


def get_random_sublist(n, li):
    random.shuffle(l1)
    start = 0
    for k in range(n - 1, 0, -1):
        tmp = random.randrange(start + 1, len(li) - k + 1)
        yield li[start:tmp]
        start = tmp
    yield l1[start:]


l1 = ['a', 'b', 'c', 'd', 'e', 'f', 'g']

for i in get_random_sublist(3, l1):
    print(i)

Output:

['d', 'b']
['c']
['f', 'g', 'a', 'e']
Himaprasoon
  • 2,609
  • 3
  • 25
  • 46