5

What is the semantic difference between size and sizeIs? For example,

List(1,2,3).sizeIs > 1 // true
List(1,2,3).size > 1   // true

Luis mentions in a comment that

...on 2.13+ one can use sizeIs > 1 which will be more efficient than size > 1 as the first one does not compute all the size before returning

Add size comparison methods to IterableOps #6950 seems to be the pull request that introduced it.

Reading the scaladoc

Returns a value class containing operations for comparing the size of this $coll to a test value. These operations are implemented in terms of sizeCompare(Int)

it is not clear to me why is sizeIs more efficient than regular size?

Mario Galic
  • 47,285
  • 6
  • 56
  • 98

1 Answers1

5

As far as I understand the changes.

The idea is that for collections that do not have a O(1) (constant) size. Then, sizeIs can be more efficient, specially for comparisons with small values (like 1 in the comment).

But why?
Simple, because instead of computing all the size and then doing the comparison, sizeIs returns an object which when computing the comparison, can return early.
For example, lets check the code

def sizeCompare(otherSize: Int): Int = {
  if (otherSize < 0) 1
  else {
    val known = knownSize
    if (known >= 0) Integer.compare(known, otherSize)
    else {
      var i = 0
      val it = iterator
      while (it.hasNext) {
        if (i == otherSize) return if (it.hasNext) 1 else 0 // HERE!!! - return as fast as possible.
        it.next()
        i += 1
      }
      i - otherSize
    }
  }
}

Thus, in the example of the comment, suppose a very very very long List of three elements. sizeIs > 1 will return as soon as it knows that the List has at least one element and hasMore. Thus, saving the cost of traversing the other two elements to compute a size of 3 and then doing the comparison.

Note that: If the size of the collection is greater than the comparing value, then the performance would be roughly the same (maybe slower than just size due the extra comparisons on each cycle). Thus, I would only recommend this for comparisons with small values, or when you believe the values will be smaller than the collection.

  • I assumed `size` is a constant operation, as in there was some mutable state acting as a counter which is updated on insertion/deletion, but now I realise it actually traverses the collection each time to size it (for dynamically sized collections). – Mario Galic Jul 16 '19 at 22:51
  • @MarioGalic for a **List** it does, but I believe **Vector** does store the size. If you dig in the changes of the PR, you can see that there are other overrides that use the precomputed size, thus there is no difference in using `sizeIs`. Actually, I would not recommend to use `sizeIs` with no-concrente collections, as it may be actually worse, due the extra objects. But for **Lists** _(which I use 90% of the time)_ is a huge win :) – Luis Miguel Mejía Suárez Jul 16 '19 at 22:55
  • 3
    @MarioGalic I think another problem is that `.size` doesn't even terminate for certain (potentially) infinite collections, whereas `sizeIs > 2` can be decided after two steps. – Andrey Tyukin Jul 16 '19 at 22:58