Howto write efficient nested loop using ranges

Question

As a scala newbie, I tried to write a method running over some 2-dimentional data. This method is called multiple times, so performance matters.

First I coded it with for comprehension

private def sumWithRange(xEnd: Int, yEnd: Int) = {
  var sum = 0
  for {
    x <- 0 until xEnd
    y <- 0 until yEnd
  } sum += 1
  sum
}

And then with while loop:

private def sumWithWhileLoop(xEnd: Int, yEnd: Int) = {
  var sum = 0
  var x = 0
  while (x < xEnd) {
    var y = 0
    while (y < yEnd) {
      sum += 1
      y += 1
    }
    x += 1
  }
  sum
}

I was surprised how slowly the range-comprehension runs, so I wrote a ScalaMeter benchmark:

object TestDoubleLoopPerformance {
  val standardConfig = config(
    Key.exec.minWarmupRuns -> 5,
    Key.exec.maxWarmupRuns -> 10,
    Key.exec.benchRuns -> 10,
    // Key.verbose -> true
  ) withWarmer(new Warmer.Default)


  def main(args: Array[String]): Unit = {
    val whileLoopTime = standardConfig measure {
      for (iter <- 0 to 1000) {
        sumWithWhileLoop(1000, 2000)
      }
    }
    println(s"while loop time: $whileLoopTime")

    val rangeLoopTime = standardConfig measure {
      for (iter <- 0 to 1000) {
        sumWithRange(1000, 2000)
      }
    }
    println(s"range loop time: $rangeLoopTime")
  }
}

The results:

while loop time: 0.3065984 ms
range loop time: 2585.5892847 ms

In addition, with verbose output, multiple GCs are reported for the for comprehension version.

I realize that for each x, new Range is created, which explains the sluggishness,

Is there a way to modify the for-comprehension version to run faster (on par with the while loop)?

// EDIT: I tried to desugar the for comprehension:

var sum = 0
(0 until xEnd).foreach(x => {
  val yRange = 0 until yEnd
  yRange.foreach(y => sum += 1)
})

and then pull up the yRange:

var sum = 0
val yRange = 0 until yEnd
(0 until xEnd).foreach(x => {
  yRange.foreach(y => sum += 1)
})

It reports running time of circa 287ms - much better than before, but still not satisfactory

Are you sure the slowdown is due to the creation of the **Range**? I would guess it is more about the fact that calling the lambda of `foreach` requires boxing the **Ints** of the **Range**. And if that, then you can't do anything, lambdas will always box. — Luis Miguel Mejía Suárez, May 11 '21 at 21:13
Nope, not sure, I think it may be a factor, but I am a bit lost, to be frank — Lesiak, May 11 '21 at 21:14

Lesiak · Accepted Answer · 2021-05-13T08:02:43.663

1

In my, rather simplistic example, I managed to get around the problem with my own Range2D class:

class Range2D(a0: Int, a1: Int) {
  def foreach(f: ((Int, Int)) => Unit): Unit = {
    var i0 = 0
    while (i0 < a0) {
      var i1 = 0
      while (i1 < a1) {
        f(i0, i1)
        i1 += 1
      }
      i0 += 1
    }
  }
}

and

private def sumWithRange(xEnd: Int, yEnd: Int) = {
  var sum = 0
  for {
    xy <- new Range2D(xEnd, yEnd)
  } sum += 1
  sum
}

On my machine, it is circa 10X slower than a raw loop, which is a good improvement over my initial code.

Edit

I tried changing Function1[(Int, Int), Unit] to Function2[Int, Int, Unit]

I cannot use Range2D in for comprehension (I get missing parameter type error).
I can use foreach: new Range2D(xEnd, yEnd).foreach((_, _) => sum += 1)
there is little difference on my machine
generated code indeed looks better - no Tuple2 are created:

Original Range2D (decompiled to java)

private int sumWithRange(final int xEnd, final int yEnd) {
    IntRef sum = IntRef.create(0);
    (new Range2D(xEnd, yEnd)).foreach((xy) -> {
        $anonfun$sumWithRange$1(sum, xy);
        return BoxedUnit.UNIT;
    });
    return sum.elem;
}

Modified Range2D (decompiled to java)

private int sumWithRange(final int xEnd, final int yEnd) {
    IntRef sum = IntRef.create(0);
    (new Range2D(xEnd, yEnd)).foreach((x$1, x$2) -> {
        ++sum.elem;
    });
    return sum.elem;
}

edited May 13 '21 at 08:02

answered May 12 '21 at 16:14

Lesiak

22,088
2
41
65

You may get better improvements if you also create your own **Function** class that is specialized to accept two **Ints** _(rather than a **tuple**)_. That way there will be no boxing anywhere and thanks to `SAM` you can use the traditional lambda syntax. – Luis Miguel Mejía Suárez May 12 '21 at 17:04
@LuisMiguelMejíaSuárez I tried your suggestion, and added notes to my answer – Lesiak May 13 '21 at 08:03
I mean, creating your own specialized **Function** class, not using the one from the stdlib. Like [this](https://scastie.scala-lang.org/BalmungSan/8uAQZE5ATJKJ2ihNvs7m6A/2) - That class should be more efficient since there won't be boxing. – Luis Miguel Mejía Suárez May 13 '21 at 14:09
@LuisMiguelMejíaSuárez I thought there is no need to write function accepting primitive ints, as both arguments of `Function2` are marked with `@specialized(Specializable.Args)` – Lesiak May 13 '21 at 19:11
1

Oh, you are completely right. I always believed it wasn't; That is pretty cool actually! - So yeah you do not need the extra trait just use a **Function2** instead of a **Function1** of a tuple to ensure there is no boxing anywhere. – Luis Miguel Mejía Suárez May 13 '21 at 20:03

Howto write efficient nested loop using ranges

1 Answers1

Linked