0

Let's say I have posts in ordered list according to their date.

[<Post: 6>, <Post: 5>, <Post: 4>, <Post: 3>, <Post: 2>, <Post: 1>]

I want to break them into 3 groups, and shuffle the items inside the list accordingly.

chunks = [posts[x:x+2] for x in xrange(0, len(posts), 2)]

Now Chunks will return:

[[<Post: 6>, <Post: 5>], [<Post: 4>, <Post: 3>], [<Post: 2>, <Post: 1>]]

What are some efficient ways to randomly shuffle these items inside each respective lists? I could think of iterating through them, creating each respective lists but this seems repetitive...

I want the final output to look something like:

[[<Post: 5>, <Post: 6>], [<Post: 4>, <Post: 3>], [<Post: 1>, <Post: 2>]]

or better:

[<Post: 5>, <Post: 6>, <Post: 4>, <Post: 3>, <Post: 1>, <Post: 2>]
Henry H
  • 612
  • 2
  • 8
  • 19

1 Answers1

1

Sure. random.shuffle works in-place so looping through list elements and applying it on them does the first job. For the "flattening" I use a favorite trick of mine: applying sum on sublists with start element as empty list.

import random,itertools

chunks = [["Post: 6", "Post: 5"], ["Post: 4", "Post: 3"], ["Post: 2", "Post: 1"]]

# shuffle

for c in chunks: random.shuffle(c)

# there you already have your list of lists with shuffled sub-lists
# now the flattening

print(sum(chunks,[]))                  # or (more complex but faster below)
print(list(itertools.chain(*chunks)))  # faster than sum on big lists

Some results:

['Post: 5', 'Post: 6', 'Post: 4', 'Post: 3', 'Post: 2', 'Post: 1']
['Post: 6', 'Post: 5', 'Post: 3', 'Post: 4', 'Post: 1', 'Post: 2']

(you said you wanted something like [[<Post: 5>, <Post: 6>, <Post: 4>, <Post: 3>, <Post: 1>, <Post: 2>]] (list in a list) but I suppose that's a typo: I provide a simple, flattened list.

Jean-François Fabre
  • 137,073
  • 23
  • 153
  • 219
  • You are right, that was a typo & solution works well. Thanks! – Henry H Sep 15 '16 at 21:59
  • `sum` is not really intended for concatenating sequences; it will create brand new sequences at each stage, meaning tons of copying and temporaries. If you want to efficiently concatenate sequences, use `itertools.chain` (to produce a generator that flattens one level) and wrap in `list` (to construct the new `list` from that generator with no temporary `list`s). `from itertools import chain`, then `print(list(chain(*chunks)))`. It's the difference between repeated string concatenation and using `''.join`. – ShadowRanger Sep 15 '16 at 22:22
  • @ShadowRanger: thanks for that. I really have to dive into `itertools`. You're right about that, although list growth performance is much better than string growth. Lists are designed to be resized. For instance, `sum` has been protected to avoid composing strings: if you try `sum("abcdef","")` you get `TypeError: sum() can't sum strings [use ''.join(seq) instead]` – Jean-François Fabre Sep 16 '16 at 13:39
  • @ShadowRanger: when you `list(chain(*chunks))` since you have to iterate through elements, isn't that the same problem ? I mean: `"".join("some string")` knows the length of the string, but since `chain` is a generator function, `list` cannot possibly know the size and it ends up adding the elements one by one, which is possibly worse that performing a `sum` 2 by 2 as in my answer (interesting chat !!) – Jean-François Fabre Sep 16 '16 at 13:43
  • @Jean-FrançoisFabre: It's true that `chain` does not allow `list` to pre-allocate the way `''.join` does. But `list` uses an overallocation strategy to minimize `realloc`s ([it's amortized `O(1)` for each element added](https://hg.python.org/cpython/file/3.5/Objects/listobject.c#l42), and a fresh construct/extend assumes [at least 8 elements could be added](https://hg.python.org/cpython/file/3.5/Objects/listobject.c#l837)); if I read the code right, for 10 element `chain` it does one `realloc`, for 100, 9 `realloc`s, for 1000, 25 `realloc`s. The allocator may not require copies each time. – ShadowRanger Sep 16 '16 at 15:05
  • @ShadowRanger so that reallocation strategy could not apply when performing `sum` on small lists? – Jean-François Fabre Sep 16 '16 at 15:07
  • And none of that requires reallocating (and setting up for GC) the `list` object itself, just its storage. By contrast, repeated `sum`'s repeated concatenation is equivalent to `+`, not `+=`, it's a form of [Painter's algorithm](https://en.wikipedia.org/wiki/Joel_Spolsky#Schlemiel_the_Painter.27s_algorithm). Every concatenation allocates a new `list`, resizes it to match the combined `list`s, copies over (incrementing ref counts) all the elements from the original `list`s, then (in this case) discarding the previous temporary `list` (doing all the decrefs required). The scaling is terrible. – ShadowRanger Sep 16 '16 at 15:12
  • For comparison, if you've got `ipython` handy, try this simple test case: `x = [[1, 2]] * 100`. Then try `%timeit sum(x, [])`, `%timeit functools.reduce(operator.iconcat, x, [])` and `%timeit list(itertools.chain.from_iterable(x))` (`from_iterable` is slightly faster than `chain(*x)`, but it's small in this case since `x` is a realized `list`, not a generator). `reduce`/`iconcat` wins (though it's less flexible; you can't choose to iterate w/o `list`ifying), `list`/`chain.from_iterable` or `chain(*)` is *just* behind, and *way* behind (5-6x longer) is `sum`. Now do `x *= 10` to see scaling. – ShadowRanger Sep 16 '16 at 15:20
  • If you repeat the test after making `x` 10x longer, `sum` takes 70x longer than its already losing speed, while the others only take 8-10x longer (the `list` overallocation strategy has fewer and fewer `realloc`s as the output size grows, so it's *just* under linear scaling in practice). – ShadowRanger Sep 16 '16 at 15:29
  • I believe you. Just the `list1 += list2` vs `list1 = list1+list2` convinced me actually. I have edited my post to provide both approaches. Thanks for the lesson. – Jean-François Fabre Sep 16 '16 at 15:31
  • @Jean-FrançoisFabre: Yar, I get a little overdetailed sometimes. Glad it was useful to someone this time. :-) BTW, I just noticed they actually [commented on your specific use of `sum` in the `sum` source](https://hg.python.org/cpython/file/3.5/Python/bltinmodule.c#l2318), explaining why they had to use `+` instead of `+=` behavior (to avoid side-effects that would alter an initial value that is referenced elsewhere), just in case you were interested. – ShadowRanger Sep 16 '16 at 15:37
  • 1
    I am interested. Thanks for the link. I'll edit my other posts where I promote sum on lists :) – Jean-François Fabre Sep 16 '16 at 17:40
  • @MooingRawr I think you should be interested by this dicussion. So long for `sum` being the cure to all evils :) – Jean-François Fabre Sep 16 '16 at 19:59