0

For example, given n, which may be 1000000 or 100000000, to find out

0 ^ 1 ^ 2 ^ 3 ^ ... ^ n

there can be several ways:

p (0..n).inject(:^)
p 0.upto(n).inject(:^)
p (n+1).times.inject(:^)

(is there any other common ways to do this in Ruby?) Which of the common ways to loop through a range in Ruby actually creates an array and therefore may cause huge memory usage? (as a general rule to remember).

On a Mac, are there ways either inside of the Ruby program and by using the OS to tell that the memory use is in fact minimal (O(1) space) to verify that no actual array is created?

P.S. one way is to make n quite big such as 100000000, run it, and use ps v in another shell to see that memory is at a low % (as compared to running a program with (0..n).to_a.inject(:^)), but I am not sure if there is a better way.

nonopolarity
  • 146,324
  • 131
  • 460
  • 740
  • 1
    In general, Ruby won't create an array unless it really has to, see https://stackoverflow.com/q/6487747/479863 (which is about 1.9.2 but it is highly unlikely that the behavior has changed for the worse). – mu is too short Aug 19 '19 at 00:45

2 Answers2

3

None of those three create an array. The first creates a Range, the last two create an Enumerator.

Proof (in MRI only, tested on 2.6.3p62):

# Does not create an array:
ruby -e 's = {}; GC.start; p "Before: #{ObjectSpace.count_objects(s)[:T_STRING]}"; ("AAAA".."ZZZZ").each { |x| p "During: #{ObjectSpace.count_objects(s)[:T_STRING]}" if x == "ZZZZ" }'
# => "Before: 8417"
# => "During: 16979"

# Creates an array:
ruby -e 's = {}; GC.start; p "Before: #{ObjectSpace.count_objects(s)[:T_STRING]}"; ("AAAA".."ZZZZ").to_a.each { |x| p "During: #{ObjectSpace.count_objects(s)[:T_STRING]}" if x == "ZZZZ" }'
# => "Before: 8417"
# => "During: 780125"

The only difference between the two is #to_a inserted before #each. The exact numbers will change from run to run, but the magnitude should be obviously different. I went with strings rather than integers because small integers are not implemented as objects and cannot be counted by Objectspace::count_objects.

EDIT: but large integers are. :)

ruby -e 'base = 4611686018427387904; s = {}; GC.start; p "Before: #{ObjectSpace.count_objects(s)[:T_BIGNUM]}"; base.upto(base + 10000).each { |x| p "During: #{ObjectSpace.count_objects(s)[:T_BIGNUM]}" if x == base }'
# => "Before: 3"
# => "During: 4"

ruby -e 'base = 4611686018427387904; s = {}; GC.start; p "Before: #{ObjectSpace.count_objects(s)[:T_BIGNUM]}"; base.upto(base + 10000).to_a.each { |x| p "During: #{ObjectSpace.count_objects(s)[:T_BIGNUM]}" if x == base }'
# => "Before: 3"
# => "During: 10005"
Amadan
  • 191,408
  • 23
  • 240
  • 301
  • I think we can infer it... although using numbers may be more relevant to the common use case. I wonder why the big increase of number of objects in the first case. Is it because the string objects created but not garbage collected. In the second case, why that many objects... 26**4 is only 456976 – nonopolarity Aug 19 '19 at 06:33
  • No idea, maybe String#succ creates an intermediate value or something. But that's an implementation detail. I added an example with integers (they just need to be large enough to exit the fixnum range). – Amadan Aug 19 '19 at 06:42
  • how come you call `GC.start` ? isn't it automatic? – nonopolarity Aug 19 '19 at 07:07
  • It's automatic but unpredictable. I wanted to start with as clear a slate as possible, given that counting live objects is what the example is about. – Amadan Aug 19 '19 at 07:10
0

Together with @Amadan's answer and another post on showing memory use and then a even better one, the following can show the memory use inside the loop:

def show_memory_used(s = "")
  puts "%.1fMB used %s" % [`ps -o rss= -p #{$$}`.to_f/1024, s]
end

show_memory_used("at the beginning")

n = 10000000

(0..n).each{|i| show_memory_used("during loop 1") if i == n}

0.upto(n){|i| show_memory_used("during loop 2") if i == n}

(n+1).times.to_a.each.each{|i| show_memory_used("during loop 3") if i == n}

Sample output:

6.0MB used at the beginning
6.0MB used during loop 1
6.0MB used during loop 2
6.0MB used during loop 3

change any of those lines to something like:

(0..n).to_a.each{|i| show_memory_used() if i == n}

to actually create an array, and you can see a big increase in memory used:

82.7MB used during loop 1
nonopolarity
  • 146,324
  • 131
  • 460
  • 740