1

I want to create a set of randomly selected indices from an input collection observations:

case class Observation(id: Long, metric1: Double)

val observations: Seq[Observation]

val NumSamples = 100
val indices = // A set of randomly selected indices of the observations
              // WITHOUT replacement

The complication is that to avoid replacement of the existing indices when selecting new ones (via myRandom.nextInt(observations.length) we need to have access to the prior ones - which is afaik not possible during the initial generation of a sequence.

An outline of what I'm looking for is shown here

Most preferred (but I doubt it can be done..)

val sampledIndices: Seq[Int] = for (randInd <- 0 until NSamples) yield {
    // some random non-repeated index in [0..length(observations)]
}

But following is a second choice:

val randomIndices = mutable.ArrayBuffer[Int]()
for (randInd <- 0 until NSamples) {
   randomIndices ++= // some random non-repeated index in
}

What to avoid: multiple vars .. which is what I am running into so far.

WestCoastProjects
  • 58,982
  • 91
  • 316
  • 560

2 Answers2

2

I think this does what you want

val sampledIndices: Seq[Int] = scala.util.Random.shuffle((0 until observations.size))
Simon
  • 6,293
  • 2
  • 28
  • 34
  • That is perfect for the particular use case ;) - and thus upvoting. I am hoping for even more: a general solution to this kind of problem - which bugs me from time to time. – WestCoastProjects Jun 05 '18 at 16:40
1

Another option is using an unfold function, which creates a Stream by creating a value and a state to get the next value in each step

def unfold[A,S](z: S)(implicit f: S => Option[(A,S)]): Stream[A] = {
  f(z) match{
    case None => Stream[A]()
    case Some((value, state)) => value#::unfold(state)
  }
}

Then to create your list:

unfold(Random)((a => Some(a.nextInt, a))).take(NSamples).toList
SCouto
  • 7,808
  • 5
  • 32
  • 49
  • Why is `f` implicit? – Yuval Itzchakov Jun 05 '18 at 12:55
  • To avoid sending it again in the recursive call: case Some((value, state)) => value#::unfold(state)(f) – SCouto Jun 05 '18 at 12:56
  • Really? ¯\_(ツ)_/¯ – Yuval Itzchakov Jun 05 '18 at 13:00
  • I just noticed the `#::unfold`. That is completely new to me and going to google it – WestCoastProjects Jun 05 '18 at 16:44
  • Can you explain how this avoids repeating indices? – WestCoastProjects Jun 05 '18 at 16:47
  • It's an operand to append a element to a Stream You can check the API here https://www.scala-lang.org/api/current/scala/collection/immutable/Stream.html – SCouto Jun 05 '18 at 16:49
  • So my point is this is a nice template but does not actually answer the OP precisely . In addition the `NSamples` is not respected for the limits of the values returned by `scala.util.Random` (vs the `java.util.Random as requested). I am trying to fix these quibbles now. Hold on. – WestCoastProjects Jun 05 '18 at 16:57
  • OK here's the fixes; `val rand = new java.util.Random() ; unfold(rand)((a => Some(a.nextInt(NSamples), a))).take(NSamples).toList` . Actually .. this *still* does not take care of *without replacement* concerns - which is the major point of the question. So this is helpful but not an actual answer. – WestCoastProjects Jun 05 '18 at 17:00