How to find all referenced files in Ruby

Question

I want to detect all the files a Ruby file directly references for documentation purposes. Reading the basic requires list is not complete because there are some files that are imported transitively and others that are imported but never used. For example:

a.rb:

require 'b'
require 'e'
class A; end
B.new; C.new

b.rb:

require 'c'
require 'd'
class B; end
C.new; D.new

c.rb:
class C; end

(d.rb and e.rb are just like c.rb)

Then the list I want to get for a.rb is b.rb, c.rb. No D or E because they are not directly referenced. Hope this makes sense!

Excluding `e.rb` and including `c.rb` are hard problems to solve (may even be equivalent to the "halting problem", I'm not certain). They essentially require full parsing of Ruby code and understanding all possible code paths. You *might* be able to make this 90% accurate though by only looking for simple lightweight dependencies (such as class definitions and basic instantiation). So is 90% accuracy acceptable? — Neil Slater, Apr 06 '13 at 18:32
@NeilSlater can you join here - http://chat.stackoverflow.com/rooms/27184/ruby-conceptual? — Arup Rakshit, Apr 06 '13 at 18:41

score 1 · Accepted Answer · answered Apr 07 '13 at 10:32

So there's some fuzziness here regarding what 'used' means. Clearly d is used since b.rb (which is also used) calls D.new at the end. If we caveat 'used' to mean "code was executed from that file, other than during the require process" then the following code is a close as I can get on ruby 1.9.3

require 'set'
def analyze(filename)
  require_depth = 0
  files = Set.new
  set_trace_func( lambda do |event, file, line, id, binding, classname|
    case event
    when 'call'then require_depth += 1 if id == :require && classname == Kernel
    when 'return' then require_depth -= 1 if id == :require && classname == Kernel
    when 'line' 
      files << file if require_depth == 0
    end
  end)
  load filename
  set_trace_func nil
  files.reject {|f| f == __FILE__ || f =~ %r{/lib/ruby/site_ruby}}
end

You'd use it by running analyse 'a.rb' (assuming that all the files involved are on the load path). What this does is uses ruby's set_trace_func to listen to what's going on. The first part is a crude attempt to ignore everything that happens during a call to require. Then we accumulate the filename of every line of executed ruby. The last line is just clearing up junk (eg the rubygems file that patches require).

This doesn't actually work for the test example: when B.new runs, no lines of code from b.rb are actually executed. However if B (and C, D etc.) have initialize methods (or some line of code that is called) then you should get the desired result. It's pretty simplistic stuff and could be fooled by all sorts of stuff. In particular if you call a method on (say) B, but the implementation of that method isn't in b.rb (e.g. an accessor defined with attr_accessor) then b.rb isn't logged

You might be able to use the call event better but I don't think much more can be done with set_trace_func.

If you are using ruby 2.0 then you can use TracePoint which is the replacement for set_trace_func. It has slightly different semantics, in particular when we track a method call it's easier to get the class it was called on so

require 'set'
def analyze(filename)
  require_depth = 0
  files = Set.new
  classes_to_files = {}
  trace = TracePoint.new(:call, :line, :return, :c_call, :class) do |tp|
    case tp.event
    when :class
      classes_to_files[tp.self] = tp.path
    when :call, :c_call then 
      if tp.method_id == :require && tp.defined_class == Kernel
        require_depth += 1
      else
        if require_depth == 0
          if path = classes_to_files[tp.self] || classes_to_files[tp.self.class]
            files << path
          end
        end
      end
    when :return then require_depth -= 1 if tp.method_id == :require && tp.defined_class == Kernel
    when :line 
      if require_depth == 0
        files << tp.path 
      end
    end
  end

  trace.enable
  load filename
  trace.disable
  files.reject {|f| f == __FILE__ || f =~ %r{/lib/ruby/site_ruby}}
end

does return a,b,c for the test example. It's still subject to the fundamental limitation that it only knows about code that actually gets executed.

Thanks! the idea works but I tweaked it (1.9 version) a little -- I added files on any event as long as the (greater) depth is 1, ie for calls the depth after addition, return before subtraction, and line when depth=1. I'm still messing with it. — alexloh, Apr 09 '13 at 20:41
The dynamic analysis like you pointed out is tricky but with a comprehensive test suite it should work. — alexloh, Apr 09 '13 at 20:43

How to find all referenced files in Ruby

1 Answers1