11

Since in Ruby everything is an object, do Ruby variables store value or address of immediate types (read primitives)? In contrast to C that stores values in variables, if they are primitives.

Humming
  • 453
  • 4
  • 15
  • Taking ahead from @akuhn's response, since in MRI, integers are objects that are already assigned object_ids (or more specifically memory address), it makes sense that their assignment would merely store the address rather than duplicating the value.This is clear since 1.object_id and a = 1; a.object_id, give the same result. – Humming Dec 29 '16 at 21:39

3 Answers3

9

NB, all of the following is for the default Ruby, which internally uses YARV aka "Yet another Ruby VM," other rubies like JRuby may use different internal representations…

Good question.

Ruby uses tag pointers for integers, and everything else is stored as a reference to an object.

How do they work? One bit in the pointer is used as tag, if that bit is set the rest of the pointer is interpreted as an integer and otherwise as an address.

This works because some bits in a pointer are not used. The bottom most bits of a memory address are usually not used. Most systems only allow address of aligned memory addresses, for example aligned to 4 bytes and so 2 bit become available to be used as tag. And then if that tag is set, the other 31 bit of the pointer are interpreted as integer.

You can see this when you look at the object_id of integers

20.to_s(2) # => "10100"
20.object_id.to_s(2) # => "101001"

On some systems two tag bits are used and then floating-point numbers are represented using the other tag bit. And there are some special objects like nil, true, false that are represented with reserved numbers that are unlikely to be valid memory addresses. Symbols are also represented as tagged integers internally but with a different bitmask than actual integers.

All other values are represented as pointers.

Fun fact, you can inspect all of that yourself using ObjectSpace class.

(0..100).each { |n| p([n, ObjectSpace._id2ref(n)]) rescue nil }

On my system this prints

[0, false]
[1, 0]
[2, 2.0]
[3, 1]
[5, 2]
[6, -2.0]
[7, 3]
[8, nil]
[9, 4]
[10, 2.0000000000000004]
[11, 5]
[13, 6]
[14, -2.0000000000000004]
[15, 7]
[17, 8]
[18, 2.000000000000001]
[19, 9]
[20, true]
[21, 10]
[22, -2.000000000000001]
[23, 11]
...
akuhn
  • 27,477
  • 2
  • 76
  • 91
  • 2
    Yes, good question, but also good answer. If more than one bit is unused, I assume the others are not used to have tag pointers for `true`, `false` and `nil`. Correct? Have you been reading [Pat Shaughnessy](http://patshaughnessy.net/ruby-under-a-microscope)? – Cary Swoveland Dec 29 '16 at 20:47
  • Yes, and you can actually inspect that yourself, let me expand on that. – akuhn Dec 29 '16 at 20:49
  • @caryswoveland updated with loop over all object IDs. – akuhn Dec 29 '16 at 20:51
  • @caryswoveland and haven't read that book, thanks for the pointer, just ordered it! All my knowledge is from Smalltalk 80 and it looks like it is still not dated :) – akuhn Dec 29 '16 at 21:03
  • 1
    Another vote for "Ruby under the microscope". Awesome book, and enough to learn for a lifetime! – Eric Duminil Dec 29 '16 at 21:31
  • 1
    Most of what you claim about "Ruby" are actually private internal implementation details of one specific version of one specific implementation of Ruby, namely YARV pre-2.2. JRuby does *not* use tagged pointers. JRuby uses full 64 bits for fixnums, even on 32 bit systems. YARV 2.2+ on 64 bit systems has flonums (62 bit floats encoded as tagged pointers). YARV 2.4 removes the user-visible distinction between `Fixnum`s and `Bignum`s and treats as them as private internal implementation details, invisible compiler optimizations that are not exposed to the programmer, just like flonums. – Jörg W Mittag Dec 30 '16 at 01:52
  • As a part of the introduction of flonums, YARV changed the object IDs of `nil`, `false` and `true`. – Jörg W Mittag Dec 30 '16 at 01:53
5

tl;dr: it doesn't matter, you can't tell, and since you can't tell, the Ruby Language Specification doesn't say anything about it, which allows the different implementors to make different choices, and they do in fact make different choices.


It doesn't matter.

The only way you could tell the difference is by modifying the object, but since all immediate objects are immutable, there is no way for you to tell one way or the other.

As it turns out, different Ruby implementations treat them differently, and there is nothing wrong with that. For example, YARV stores Integers as tagged pointers (called fixnums) or object references (called bignums) depending on size, but again, the very same number may be stored as either, because on 64 bit systems, it uses 63 bits for fixnums, and on 32 bit systems, it only uses 31 bit. JRuby OTOH doesn't use tagged pointers (it doesn't use pointers at all, since Java simply doesn't have them), and uses full 64 bits for fixnums regardless of machine word size, instead of YARV which uses 31 or 63 bits.

Likewise, YARV on 64 bit systems uses a 62 bit tagged pointer format for Floats which fit in 62 bits (which they call flonums), but on 32 bit systems and for larger Floats, it uses a different encoding.

Jörg W Mittag
  • 363,080
  • 75
  • 446
  • 653
2

Every object in Ruby has an address. You can find it by doing this obj.object_id << 1. Here is were everything gets interesting though

a = 'Hello'
puts a.object_id << 1   # 140608279312760
a += 'World'
puts a.object_id << 1   # 140608279205240
b = 'HelloWorld'
puts b.object_id << 1   # 140608271586720
c = a
puts c.object_id << 1   # 140608279205240

This shows that objects are stored by address in ruby. Additionally, addresses are unique, unless using =. However, note that in our previous example, 140608279205240 only refers to that specific 'HelloWorld'. Any changes made to either a or c will not affect the other, but will change the address of the variable with the change made to it.

Eli Sadoff
  • 7,173
  • 6
  • 33
  • 61
  • Sorry for the downvote, but alas this answer is not correct. Integers are tagged pointers and the other are plain addresses, no need for shifting by one. – akuhn Dec 29 '16 at 21:05
  • 1
    I got the shifting by `1` from [here](http://stackoverflow.com/questions/2402228/accessing-objects-memory-address-in-ruby), but on the whole you have a better answer anyways. – Eli Sadoff Dec 29 '16 at 21:06
  • Hmm, I am pretty sure pointers are not shifted. Otherwise they would become tagged floating point numbers. Maybe someone on that other question misunderstood the tagged pointers. Lemme look into that tonight … – akuhn Dec 29 '16 at 21:09
  • I stand corrected, you are right. The internal representation is unshifted but `object_id` returns shifted memory addresses, for whatever reason. Can you remove the example with the integers though? – akuhn Dec 30 '16 at 09:22
  • @akuhn That's completely fine. Thanks for actually looking into it and coming back. You're the kind of person SO deserves. – Eli Sadoff Dec 30 '16 at 15:33