0

I have a list containing sequences of various lengths. The pattern of a sequence is as follows:

x_k, y_k, ..., x_k_i, y_k_i, ... z_k

For example, a list having 4 sequences with lengths: 3, 3, 5, and 7 is as follows:

input_list = ['x_1', 'y_1', 'z_1', 
              'x_2', 'y_2', 'z_2', 
              'x_3_1', 'y_3_1', 'x_3_2', 'y_3_2', 'z_3', 
              'x_4_1', 'y_4_1', 'x_4_2', 'y_4_2', 'x_4_3', 'y_4_3', 'z_4']

I need to shuffle the list, such that the order of sequences is shuffled, but the entries within a sequence is not shuffled.

For example, a candidate output would be as follows:

shuffled_list = ['x_3_1', 'y_3_1', 'x_3_2', 'y_3_2', 'z_3', 
                 'x_1', 'y_1', 'z_1',
                 'x_4_1', 'y_4_1', 'x_4_2', 'y_4_2', 'x_4_3', 'y_4_3', 'z_4',
                 'x_2', 'y_2', 'z_2']

One way to achieve this would be by saving each sequence as a separate list, and then having a nested list represent all the sequences. Then, one by one randomly removing a list (i.e., a sequence) from the nested list and appending the removed list's elements in the final shuffled list.

Is there a more efficient way to achieve the same?

jonrsharpe
  • 115,751
  • 26
  • 228
  • 437
SaadH
  • 1,158
  • 2
  • 23
  • 38
  • 3
    Do you know the lengths of the sequences beforehand, i.e. something like `lengths = [3, 3, 5, 7]`? If you have, you could build a list of ranges corresponding to the positions of the sequences, shuffle this list, and then build the result out of the corresponding indices. – Timus May 07 '23 at 10:54
  • Why do you think the approach you mention at the end isn't efficient? – Kelly Bundy May 07 '23 at 11:01
  • 1
    @Timus: good idea; following your approach, building the list of ranges is easy with `itertools.accumulate` and `itertools.pairwise` for example (I've become very fond of itertools in these last days :) ) – Swifty May 07 '23 at 11:03
  • @Timus, yes, the lengths of the sequences are known beforehand. – SaadH May 07 '23 at 12:30
  • @KellyBundy, not sure. Was hoping there would be a better approach :) – SaadH May 07 '23 at 12:32

3 Answers3

1

You can group your items on the first index, with the use of itertools.groupby:

from itertools import groupby

input_list = ['x_1', 'y_1', 'z_1', 
              'x_2', 'y_2', 'z_2', 
              'x_3_1', 'y_3_1', 'x_3_2', 'y_3_2', 'z_3', 
              'x_4_1', 'y_4_1', 'x_4_2', 'y_4_2', 'x_4_3', 'y_4_3', 'z_4']

def key(x):
    return x.split('_')[1]

group_dic = {k:list(g) for k,g in groupby(input_list, key=key)}

{'1': ['x_1', 'y_1', 'z_1'],
 '2': ['x_2', 'y_2', 'z_2'],
 '3': ['x_3_1', 'y_3_1', 'x_3_2', 'y_3_2', 'z_3'],
 '4': ['x_4_1', 'y_4_1', 'x_4_2', 'y_4_2', 'x_4_3', 'y_4_3', 'z_4']}

Of course you don't HAVE to make a dictionary; you can use the results however you want.

Now on to the shuffling part:

from random import shuffle

order = list(group_dic.keys())
shuffle(order)

output_list = [entry for i in order for entry in group_dic[i]]

Example output:

['x_1', 'y_1', 'z_1', 'x_4_1', 'y_4_1', 'x_4_2', 'y_4_2', 'x_4_3', 'y_4_3', 'z_4', 'x_2', 'y_2', 'z_2', 'x_3_1', 'y_3_1', 'x_3_2', 'y_3_2', 'z_3']

Edit: following on @Timus' idea, if you have a list of lengths of the subgroups, you can make a list of corresponding ranges, shuffle it, then build the output_list from the shuffled ranges; for example:

from itertools import accumulate, pairwise
from random import shuffle

lengths =  [3, 3, 5, 7]
ranges = [slice(*pair) for pair in pairwise(accumulate(lengths, initial=0))]
shuffle(ranges)
output_list = sum((input_list[range] for range in ranges), start=[])

# ['x_1', 'y_1', 'z_1', 'x_4_1', 'y_4_1', 'x_4_2', 'y_4_2', 'x_4_3', 'y_4_3', 'z_4', 'x_2', 'y_2', 'z_2', 'x_3_1', 'y_3_1', 'x_3_2', 'y_3_2', 'z_3']
Swifty
  • 2,630
  • 2
  • 3
  • 21
  • Thanks for the input @Swifty, but this approach is similar to what I have already suggested i.e., saving each sequence as a list. If I don't receive any more answers, I will mark this one as accepted. – SaadH May 07 '23 at 09:20
  • Indeed, I just wrote a process to do this efficiently from the already 'flattened' list. – Swifty May 07 '23 at 10:53
  • 1
    `sum` with lists isn't really efficient... – Kelly Bundy May 07 '23 at 11:57
  • @KellyBundy: is `[entry for range in ranges for entry in input_list[range]]` better? – Swifty May 07 '23 at 14:46
  • I'd say it is, as the sum way can take quadratic time. Though it depends on the sizes of the lists. With very few but very long lists, it can be faster by a constant factor. I might use `chain`, or a loop with `+=`, or `reduce` with `iconcat`. But the list comp with two `for` clauses is fine. – Kelly Bundy May 07 '23 at 18:17
  • Ok, thanks; I'll keep that in mind in the future. – Swifty May 07 '23 at 19:01
  • [Benchmarks](https://stackoverflow.com/a/45323085/12671057) of various solutions and inputs. – Kelly Bundy May 07 '23 at 19:36
  • @KellyBundy: very nice, thanks. – Swifty May 09 '23 at 00:52
0

If the format of entries is strictly as shown in the example, then following @Swifty's comment, the list can be shuffled using itertools.groupby.

import itertools
import random

input_list = ['x_1', 'y_1', 'z_1', 
              'x_2', 'y_2', 'z_2', 
              'x_3_1', 'y_3_1', 'x_3_2', 'y_3_2', 'z_3', 
              'x_4_1', 'y_4_1', 'x_4_2', 'y_4_2', 'x_4_3', 'y_4_3', 'z_4']

# Grouping the entries by their sequence
sequences = [list(g) for k, g in itertools.groupby(input_list, lambda x: x.split('_')[1] if '_' in x else x)]

random.shuffle(sequences)

# Flattening the shuffled list
shuffled_list = list(itertools.chain(*sequences))
SaadH
  • 1,158
  • 2
  • 23
  • 38
0

You could do it with a sort that maps the sequence identifiers to random values. Because Python's sort is stable, the items with the same sequence identifiers will remain in the same relative order:

input_list = ['x_1', 'y_1', 'z_1', 
              'x_2', 'y_2', 'z_2', 
              'x_3_1', 'y_3_1', 'x_3_2', 'y_3_2', 'z_3', 
              'x_4_1', 'y_4_1', 'x_4_2', 'y_4_2', 'x_4_3', 'y_4_3', 'z_4']

import random

seqId  = {s.split("_")[1]:random.random() for s in input_list}
input_list.sort(key=lambda s:seqId[s.split("_")[1]])

print(input_list)

['x_4_1', 'y_4_1', 'x_4_2', 'y_4_2', 'x_4_3', 'y_4_3', 'z_4', 
 'x_2', 'y_2', 'z_2', 
 'x_1', 'y_1', 'z_1', 
 'x_3_1', 'y_3_1', 'x_3_2', 'y_3_2', 'z_3']

To avoid the orverhead of sorting, you could create a list of ranges corresponding to the sequences and rebuild the list after shuffling the range list:

*ends, = {s.split("_")[1]:i for i,s in enumerate(input_list,1)}.values()
ranges = [(s,e) for s,e in zip([0]+ends,ends)]
random.shuffle(ranges)
input_list = [s for s,e in ranges for s in input_list[s:e]]

print(input_list)
['x_2', 'y_2', 'z_2', 
 'x_1', 'y_1', 'z_1', 
 'x_3_1', 'y_3_1', 'x_3_2', 'y_3_2', 'z_3', 
 'x_4_1', 'y_4_1', 'x_4_2', 'y_4_2', 'x_4_3', 'y_4_3', 'z_4']
Alain T.
  • 40,517
  • 4
  • 31
  • 51