0

Just as a load test, I was playing with different data structures in Scala. Just wondering what it takes to work or even create a one billion length array. 100 million seems to be no problem, of course there's no real magic about the number 1,000,000,000. I'm just seeing how far you can push it.

I had to bump up memory on most of the tests. export JAVA_OPTS="-Xms4g -Xmx8g"

// insanity begins ...
val buf = (0 to 1000000000 - 1).par.map { i => i }.toList
// java.lang.OutOfMemoryError: GC overhead limit exceeded

However preallocating an ArrayInt works pretty well. It takes about 9 seconds to iterate and build the object. Interestingly, doing almost anything with ListBuffer seems to automatically take advantage of all cores. However, the code above will not finish (at least with 8gb Xmx).

I understand that this is not a common case and I'm just messing around. But if you had to pull some massive thing into memory, is there a more efficient technique? Is Array with type as efficient as it gets?

squarism
  • 3,242
  • 4
  • 26
  • 35
  • 1
    I had performed some similar measurements some time back. See http://stackoverflow.com/a/6320133/257449. – huynhjl Apr 01 '13 at 03:22

2 Answers2

3

The per-element overhead of a List is considerable. Each element is held in a cons cell (case class ::) which means there is one object with two fields for every element. On a 32-bit JVM that's 16 bytes per element (not counting the element value itself). On a 64-bit JVM it's going to be even higher.

List is not a good container type for extremely large contents. Its primary feature is very efficient head / tail decomposition. If that's something you need then you may just have to deal with the memory cost. If it's not, try to choose a more efficient representation.

For what it's worth, I consider memory overhead considerations to be one thing that justifies using Array. There are lots of caveats around using arrays, so be careful if you go that way.

Randall Schulz
  • 26,420
  • 4
  • 61
  • 81
1

Given that the JVM can sensibly arrange an Array of Ints in memory, if you really need to iterate over them it would indeed be the most efficient approach. It would generate much the same code if you did exactly the same thing with Java.

Sean Parsons
  • 2,832
  • 21
  • 17