Is there a way in Scala to obtain a stream/view/iterator for an immutable collection (e.g. List or Vector) which will traverse the collection in random order? As far as I understand Random.shuffle(coll) creates a copy, and it is not a memory-efficient way. Is Random.shuffle(coll.view) (or coll.stream or coll.iterator) a better approach? Does this create a noticeable CPU overhead? Or is there some more appropriate way?
Asked
Active
Viewed 1,550 times
2 Answers
2
Shuffling algorithms need to move randomly around in the collection and need to remember past choices somehow. The immutable collections are not very speedy with random access (O(n)
instead of O(1)
for List
, arguably O(1)
for Vector
but the constant factor is large).
Copying the collection, in comparison, is almost always the wise thing to do.

Rex Kerr
- 166,841
- 26
- 322
- 407
-
In (my) case when memory is more valuable than CPU copying is not acceptable. So I need the less CPU-intensive approach. Does Random.shuffle(myVector.iterator) copy the whole vector? – Uniqus Aug 13 '14 at 06:37
-
@Uniqus See line 108 in https://github.com/scala/scala/blob/v2.11.2/src/library/scala/util/Random.scala#L1 (new ArrayBuffer created) – elm Aug 13 '14 at 06:58
-
@Uniqus - If you have enough entries for another set of pointers to be an issue, I'm pretty sure you won't want to go from a runtime of `O(n)` to traverse the randomized list to `O(n^2)`. When `n` is big, `O(n^2)` is pretty awful (seconds -> weeks). – Rex Kerr Aug 13 '14 at 08:15
-
Yep, storing past choices will still eat memory (not at once, but with time), so it will just postpone a potential problem. So the only acceptable way is indeed to make a copy. Well, thanks to everyone, question is answered. – Uniqus Aug 13 '14 at 09:27
-
There is [a (possible) solution](http://stackoverflow.com/a/20194928/1410299) in C++, but I decided to stick with copying. – Uniqus Aug 13 '14 at 09:34
-
@Uniqus - Finding cycles that fit nicely within/near your sequence length is not a problem solved in general in that answer, and when you do solve it you get _one_ shuffle of the data. If you want to shuffle your data differently, you need another. So I wouldn't really consider that an answer. – Rex Kerr Aug 13 '14 at 21:45
2
If collections copying is taken into account, consider shuffling an Array
of indices to a Vector
of interest, for instance like this,
val myVector: Vector[MyObject] = ...
val shuffledIdx = util.Random.shuffle(0 until myVector.size)
This copies a sequence of Int
, likely lighter than dedicated objects. Then
shuffledIdx.map { idx => task (myVector(idx)) }
iterates / maps over each item in myVector
in a random order.

elm
- 20,117
- 14
- 67
- 113