8

My goal is to replace methods in the String class with other methods that do additional work (this is for a research project). This works for many methods by writing code in the String class similar to

alias_method :center_OLD, :center
def center(args*)
  r = self.send(*([:center_OLD] + args))
  #do some work here 
  #return something
end

For some methods, I need to handle a Proc as well, which is no problem. However, for the scan method, invoking it has the side effect of setting special global variables from the regular expression match. As documented, these variables are local to the thread and the method.

Unfortunately, some Rails code makes calls to scan which makes use of the $& variable. That variable gets set inside my version of the scan method, but because it's local, it doesn't make it back to the original caller which uses the variable.

Does anyone know a way to work around this? Please let me know if the problem needs clarification.

If it helps at all, all the uses I've seen so far of the $& variable are inside a Proc passed to the scan function, so I can get the binding for that Proc. However, the user doesn't seem to be able to change $& at all, so I don't know how that will help much.

Current Code

class String
  alias_method :scan_OLD, :scan
  def scan(*args, &b)
    begin

      sargs = [:scan_OLD] + args

      if b.class == Proc
        r = self.send(*sargs, &b)
      else
        r = self.send(*sargs)
      end
      r

    rescue => error
      puts error.backtrace.join("\n")
    end
  end
end

Of course I'll do more things before returning r, but this even is problematic -- so for simplicity we'll stick with this. As a test case, consider:

"hello world".scan(/l./) { |x| puts x }

This works fine both with and without my version of scan. With the "vanilla" String class this produces the same thing as

"hello world".scan(/l./) { puts $&; }

Namely, it prints "ll" and "ld" and returns "hello world". With the modified string class it prints two blank lines (since $& was nil) and then returns "hello world". I'll be happy if we can get that working!

bchurchill
  • 1,410
  • 8
  • 23
  • You're running into the problem of global variables; They can change anywhere, and effect everything that needs to see them. – the Tin Man Oct 27 '13 at 00:26
  • That's actually the opposite of what's going on -- the "global" variables aren't in the binding they need to be in. (They aren't really global, that's just what the documentation calls them. If someone can explain why I'd like to hear). – bchurchill Oct 27 '13 at 00:30
  • Ah, I see what you're saying. Interesting. It might be interesting to compare the behavior from 1.8.7 to 1.9.3 to 2.0 and see if there was a change somewhere along the line. – the Tin Man Oct 27 '13 at 00:35
  • 1
    Would it be appropriate to hack on Object instance evaluation inside of your version of scan method ? Something like ```Object.$& = $&```. Before it you should define ```attr_accessor :$&``` and possible override reader. – Waterlink Oct 30 '13 at 20:35
  • 1
    ouch, good luck on this one. Would be much simpler if those "special variables" were real globals. I'm not really good at this, but it seems that [MRI ruby's parser](https://github.com/ruby/ruby/blob/trunk/parse.y#L10260) treats those vars differently than the other globals ; maybe if you could find how the parser evaluates the special vars, you'd have a clue on how to access / write them in the first place. But I bet you will have to mess with C or the parser for this... – m_x Oct 31 '13 at 20:07
  • @Waterlink that sounds legit! Could you give more details on how that would work? Is :$& just an attribute of self inherited from Object or something like that? m_x: Thanks for the info -- I'm really hoping to avoid messing with the parser or ruby internals, but if it comes down to that... Andrew: I'll update my question in a moment. – bchurchill Oct 31 '13 at 20:34

2 Answers2

4

You cannot set $&, because it is derived from $~, the last MatchData. However, $~ can be set and that actually does what you want. The trick is to set it in the block binding.

The code is inspired by the old Ruby implementation of Pathname.
(The new code is in C and does not need to care about Ruby frame-local variables)

class String
  alias_method :scan_OLD, :scan
  def scan(*args, &block)
    sargs = [:scan_OLD] + args

    if block
      self.send(*sargs) do |*bargs|
        Thread.current[:string_scan_matchdata] = $~
        eval("$~ = Thread.current[:string_scan_matchdata]", block.binding)
        yield(*bargs)
      end
    else
      self.send(*sargs)
    end
  end
end

The saving of the thread-local (well, actually fiber-local) variable seems unnecessary since it is only used to pass the value and the thread never reads any other value than the last one set. It probably is there to restore the original value (most likely nil, because the variable did not exist).

One way to avoid thread-locals at all is to create a setter of $~ as a lambda (but it does create a lambda for each call):

self.send(*sargs) do |*bargs|
  eval("lambda { |m| $~ = m }", block.binding).call($~)
  yield(*bargs)
end

With any of these, your example works!

eregon
  • 1,486
  • 14
  • 15
  • 1
    Looks promising! Let me try that out. – bchurchill Nov 02 '13 at 09:07
  • @bchurchill How did it go? – eregon Nov 02 '13 at 19:54
  • Looks like it works! It turns out in Ruby 2.0 there's a way to get the binding of the caller via DebugInspector; take a look at http://stackoverflow.com/questions/1356749/can-you-eval-code-in-the-context-of-a-caller-in-ruby. The fact that I could set $~ made everything tick though! – bchurchill Nov 03 '13 at 20:28
1

I wrote simple code simulating the problem:

"hello world".scan(/l./) { |x| puts x }
"hello world".scan(/l./) { puts $&; }

class String
   alias_method :origin_scan, :scan

   def scan *args, &b
      args.unshift :origin_scan
      @mutex ||= Mutex.new
      begin
         self.send *args do |a|
            break if !block_given?
            @mutex.synchronize do
               p $& 
               case b.arity
               when 0
                  b.call
               when 1
                  b.call a
               end
            end
         end
      rescue => error
         p error, error.backtrace.join("\n")
      end
   end
end

"hello world".scan(/l./) { |x| puts x }
"hello world".scan(/l./) { puts $& }

And found the following. The change of containment of the variable $& became inside a :call function, i.e. on 3-rd step before :call $& contains a valid value, but inside the block it becomes the invalid. I guess this become due to the singularity stack and variable restoration during the change process/thread context, because, probably, :call function can't access the :scan local state.

I see two variants: the first is to avoid to use global variables in the specific function redefinitions, and second, may to dig sources of ruby more deeply.

Малъ Скрылевъ
  • 16,187
  • 5
  • 56
  • 69