14

See this simple Ruby class:

require 'byebug'

class Foo
  def run
    byebug

    puts defined?(bar)
    puts bar.inspect

    bar = 'local string'

    puts defined?(bar)
    puts bar.inspect
  end

  def bar
    'string from method'
  end
end

Foo.new.run

When running this class the following behavior can be observed in the debugger's console:

    $ ruby byebug.rb

    [2, 11] in /../test.rb
        2:
        3: class Foo
        4:   def run
        5:     byebug
        6:
    =>  7:     puts defined?(bar)
        8:     puts bar.inspect
        9:
       10:     bar = 'local string'
       11:

At the breakpoint the debugger returns the following values:

    (byebug) defined?(bar)
    "local-variable"
    (byebug) bar.inspect
    "nil"

Note that - although the debugger's breakpoint is in line #5 - it already knows that there will be a local variable bar defined in line #10 that will shadow the method bar and the debugger is actually not able anymore to call the bar method. What is doesn't know at this point is that the string 'local string' will be assign to bar. The debugger returns nil for bar.

Let's continue with the original code in the Ruby file and look at its output:

    (byebug) continue
    method
    "string from method"
    local-variable
    "local string"

At run time in line #7 Ruby still knowns that bar is indeed a method and it is still able to call it in line #8. Then l ine #10 actually defines the local variable that shadows the method with the same name and tTherefore Ruby returns like expected in line #12 and #13.

Questions: Why does the debugger return different values than the original code? It seems like it is able to look into the future. Is this considered a feature or a bug? Is this behavior documented?

spickermann
  • 100,941
  • 9
  • 101
  • 131
  • I wouldn't say it's Ruby, but `byebug`. If you wrap it in `rescue NoMethodError` and `puts defined?(bar)` before and after the assignment, you will see that it's `"method"` before that line and `"local-variable"` after it. – ndnenkov Mar 09 '17 at 14:06
  • Note that Byebug's double arrow already points to line 12, not 11. – Stefan Mar 09 '17 at 14:09
  • 1
    @Stefan, I though the same as well, but it's not that. If you put another statement that does something on a separate line between the two, you still get the same result. – ndnenkov Mar 09 '17 at 14:10
  • @ndn "same result" means that Byebug reports `"local-variable"` whereas a `p defined?(bar)` before the assignment would print `"method"`? – Stefan Mar 09 '17 at 14:12
  • @ndn I had the same thought about `byebug`, but the ruby debugger shows the same behaviour. Drop the code into `test.rb` (removing byebug). Then run `ruby -r debug test.rb`. On the breakpoint, inspect local variables with `v l`, and `bar => nil` does appear – gwcodes Mar 09 '17 at 14:12
  • @Stefan, exactly. – ndnenkov Mar 09 '17 at 14:14
  • @gwcodes Ruby recognizes `bar` as a local variable at the beginning of the method (`local_variables` would return `[:bar]`). But it is only treated as such after the assignment, i.e. `bar` resolves to the method beforehand and to the local variable afterwards. – Stefan Mar 09 '17 at 14:14
  • @gwcodes, indeed it is not `byebug` specific. Reproducible with 2.4.0's `binding.irb` as well. – ndnenkov Mar 09 '17 at 14:20
  • @Stefan, you seem to be correct. Do you know if this is something introduced with newer versions of Ruby? My entire life was a lie! xd – ndnenkov Mar 09 '17 at 14:22
  • @ndn the mystical "at the parser stage". In fact, I have no idea how that actually works. Is there a good book about Ruby's internals? – Stefan Mar 09 '17 at 14:30
  • 2
    @Stefan: yes, there is. Pick up a copy of "ruby under a microscope" – Sergio Tulentsev Mar 09 '17 at 14:30
  • @SergioTulentsev, I've heard of it, but never read it. Is this specific topic expanded there? – ndnenkov Mar 09 '17 at 14:31
  • 2
    @SergioTulentsev book arrives tomorrow, thanks! :-) – Stefan Mar 09 '17 at 14:32
  • 1
    @ndn: it's been a while since I read it, but I remember it describing method parsing and execution in the most detailed way. Down to opcodes. :) – Sergio Tulentsev Mar 09 '17 at 14:32
  • @SergioTulentsev, awesome, I'll get it as well. :) – ndnenkov Mar 09 '17 at 14:34
  • (_they should pay me for this!_) – Sergio Tulentsev Mar 09 '17 at 14:35
  • @ndn: congrats with golden ruby badge :) – Sergio Tulentsev Mar 09 '17 at 14:41
  • @SergioTulentsev, aw thanks. (: – ndnenkov Mar 09 '17 at 14:41
  • I will just drop it here: https://bugs.ruby-lang.org/issues/10314 – Aleksei Matiushkin Mar 09 '17 at 16:40
  • I refactored the question to make it clearer. The part about the exception was misleading and caused confusion, because the exception had actually nothing to do with the problem caused by the debugger. – spickermann Mar 09 '17 at 17:47
  • @spickermann, you can simplify the question even further. I think the gist of it is [this](https://gist.github.com/ndnenkov/eb8bc970b727236317f39f0d677d3030). Debuggers just work on the given binding as far as I can tell. – ndnenkov Mar 10 '17 at 07:50
  • So it seems like this is not an debugger issue, but an issue with `binding`. From the docs: *Objects of class Binding encapsulate the execution context at some particular place in the code and retain this context for future use. The variables, methods, value of self, and possibly an iterator block that can be accessed in this context are all retained.* That seems to be not 100% correct, because ist is not *at some particular place in the code* (the binding knows more than the place in the code) and the methods and variables are not exactly retained. Should we considered this a being a bug? – spickermann Mar 10 '17 at 08:04
  • @spickermann, I wouldn't say the description is inaccurate. It depends on your interpretation. The *place* in code does not necessarily mean the *line* in code. I doubt it's a bug, it's just leaky internals. It's not a regression, going as far back as 1.8.7 you still get the same behaviour. – ndnenkov Mar 10 '17 at 08:21
  • @spickermann It's not an issue with `binding` but rather `eval`, and no this isn't a bug: it would just be incredibly difficult to prevent this behavior. The code gets evaluated first evaluated into YARV bytecode and a local variable table, and then at runtime `eval()` runs in the context of the existing byte code and local variable table, which stores a reference to the local variable you're seeing. – fny Mar 16 '17 at 19:14
  • Updated with some instruction dumps, I can see why it's happening, but I can't tell whether this is an intentional or understood behavior. – fny Mar 20 '17 at 18:16

1 Answers1

6

Whenever you drop into a debugging session, you're effectively executing an eval against the binding at that spot in the code. Here's a simpler bit of code that recreates the behavior that's driving you nuts:

def make_head_explode
  puts "== Proof bar isn't defined"
  puts defined?(bar)   # => nil

  puts "== But WTF?! It shows up in eval"
  eval(<<~RUBY)
    puts defined?(bar) # => 'local-variable'
    puts bar.inspect   # => nil
  RUBY

  bar = 1
  puts "\n== Proof bar is now defined"
  puts defined?(bar)   # => 'local-variable'
  puts bar.inspect     # => 1
end

When the method make_head_explode is fed to the interpreter, it's compiled to YARV instructions, a local table, which stores information about the method's arguments and all local variables in the method, and a catch table that includes information about rescues within the method if present.

The root cause of this issue is that since you're compiling code dynamically at runtime with eval, Ruby passes the local table, which includes an unset variable enry, to eval as well.

To start, let's use a use a very simple method that demonstrates the behavior we'd expect.

def foo_boom
  foo         # => NameError
  foo = 1     # => 1
  foo         # => 1
end

We can inspect this by extracting the YARV byte code for the existing method with RubyVM::InstructionSequence.disasm(method). Note I'm going to ignore trace calls to keep the instructions tidy.

Output for RubyVM::InstructionSequence.disasm(method(:foo_boom)) less trace:

== disasm: #<ISeq:foo_boom@(irb)>=======================================
local table (size: 2, argc: 0 [opts: 0, rest: -1, post: 0, block: -1, kw: -1@-1, kwrest: -1])
[ 2] foo
0004 putself
0005 opt_send_without_block <callinfo!mid:foo, argc:0, FCALL|VCALL|ARGS_SIMPLE>, <callcache>
0008 pop
0011 putobject_OP_INT2FIX_O_1_C_
0012 setlocal_OP__WC__0 2
0016 getlocal_OP__WC__0 2
0020 leave                                                            ( 253)

Now let's walk through the trace.

local table (size: 2, argc: 0 [opts: 0, rest: -1, post: 0, block: -1, kw: -1@-1, kwrest: -1])
[ 2] foo

We can see here that YARV has identified we have the local variable foo, and stored it in our local table at index [2]. If we had other local variables and arguments, they'd also appear in this table.

Next we have the instructions generated when we try to call foo before its assigned:

  0004 putself
  0005 opt_send_without_block <callinfo!mid:foo, argc:0, FCALL|VCALL|ARGS_SIMPLE>, <callcache>
  0008 pop

Let's dissect what happens here. Ruby compiles function calls for YARV according to the following pattern:

  • Push receiver: putself, referring to top-level scope of function
  • Push arguments: none here
  • Call the method/function: function call (FCALL) to foo

Next we have the instructions for setting at getting foo once it becomes a global variable:

0008 pop
0011 putobject_OP_INT2FIX_O_1_C_
0012 setlocal_OP__WC__0 2
0016 getlocal_OP__WC__0 2
0020 leave                                                            ( 253)

Key takeaway: when YARV has the entire source code at hand, it knows when locals are defined and treats premature calls to local variables as FCALLs just as you'd expect.

Now let's look at a "misbehaving" version that uses eval

def bar_boom
  eval 'bar'     # => nil, but we'd expect an errror
  bar = 1         # => 1
  bar
end

Output for RubyVM::InstructionSequence.disasm(method(:bar_boom)) less trace:

== disasm: #<ISeq:bar_boom@(irb)>=======================================
local table (size: 2, argc: 0 [opts: 0, rest: -1, post: 0, block: -1, kw: -1@-1, kwrest: -1])
[ 2] bar
0004 putself
0005 putstring        "bar"
0007 opt_send_without_block <callinfo!mid:eval, argc:1, FCALL|ARGS_SIMPLE>, <callcache>
0010 pop
0013 putobject_OP_INT2FIX_O_1_C_
0014 setlocal_OP__WC__0 2
0018 getlocal_OP__WC__0 2
0022 leave                                                            ( 264)

Again we see a local variable, bar, in the locals table at index 2. We also have the following instructions for eval:

0004 putself
0005 putstring        "bar"
0007 opt_send_without_block <callinfo!mid:eval, argc:1, FCALL|ARGS_SIMPLE>, <callcache>
0010 pop

Let's dissect what happens here:

  • Push receiver: again putself, referring to top-level scope of function
  • Push arguments: "bar"
  • Call the method/function: function call (FCALL) to eval

Afterward, we have the standard assignment to bar that we'd expect.

0013 putobject_OP_INT2FIX_O_1_C_
0014 setlocal_OP__WC__0 2
0018 getlocal_OP__WC__0 2
0022 leave                                                            ( 264)

Had we not had eval here, Ruby would have known to treat the call to bar as a function call, which would have blown up as it did in our previous example. However, since eval is dynamically evaluated and the instructions for its code won't be generated until runtime, the evaluation occurs in the context of the already determined instructions and local table, which holds the phantom bar that you see. Unfortunately, at this stage, Ruby is unaware that bar was initialized "below" the eval statement.

For a deeper dive, I'd recommend reading Ruby Under a Microscope and the Ruby Hacking Guide's section on Evaluation.

fny
  • 31,255
  • 16
  • 96
  • 127