0

So what I am doing is iterating over various versions of snippet of code (for e.g. Associations.rb in Rails).

What I want to do is just extract one snippet of the code, for example the has_many method:

  def has_many(name, scope = nil, options = {}, &extension)
    reflection = Builder::HasMany.build(self, name, scope, options, &extension)
    Reflection.add_reflection self, name, reflection
  end

At first I was thinking of just searching this entire file for the string def has_many and then saving everything between that string and end. The obvious issue with this, is that different versions of this file can have multiple end strings within the method.

For instance, whatever I come up with for the above snippet, should also work for this one too:

  def has_many(association_id, options = {})
    validate_options([ :foreign_key, :class_name, :exclusively_dependent, :dependent, :conditions, :order, :finder_sql ], options.keys)
    association_name, association_class_name, association_class_primary_key_name =
          associate_identification(association_id, options[:class_name], options[:foreign_key])

    require_association_class(association_class_name)

    if options[:dependent] and options[:exclusively_dependent]
      raise ArgumentError, ':dependent and :exclusively_dependent are mutually exclusive options.  You may specify one or the other.' # ' ruby-mode
    elsif options[:dependent]
      module_eval "before_destroy '#{association_name}.each { |o| o.destroy }'"
    elsif options[:exclusively_dependent]
      module_eval "before_destroy { |record| #{association_class_name}.delete_all(%(#{association_class_primary_key_name} = '\#{record.id}')) }"
    end

    define_method(association_name) do |*params|
      force_reload = params.first unless params.empty?
      association = instance_variable_get("@#{association_name}")
      if association.nil?
        association = HasManyAssociation.new(self,
          association_name, association_class_name,
          association_class_primary_key_name, options)
        instance_variable_set("@#{association_name}", association)
      end
      association.reload if force_reload
      association
    end

    # deprecated api
    deprecated_collection_count_method(association_name)
    deprecated_add_association_relation(association_name)
    deprecated_remove_association_relation(association_name)
    deprecated_has_collection_method(association_name)
    deprecated_find_in_collection_method(association_name)
    deprecated_find_all_in_collection_method(association_name)
    deprecated_create_method(association_name)
    deprecated_build_method(association_name)
  end

Assuming that each value is stored as text in some column in my db.

How do I approach this, using Ruby's string methods or should I be approaching this another way?

Edit 1

Please note that this question relates specifically to string manipulation via using a Regex, without a parser.

marcamillion
  • 32,933
  • 55
  • 189
  • 380
  • If you can rely on indentation, check http://rubular.com/r/54KgpTRo8C ... Notice though, it could break with multine strings, HEREDOC, %{ }, etc. – Mariano Jul 28 '16 at 07:51
  • @Mariano Holy crap. That looks awesome. Can you add it as an answer, and explain the different elements. It may not be perfect, but it's an awesome start! – marcamillion Jul 28 '16 at 07:53
  • Sure, but I'd wait for someone to recommend a code parser. That's the way it should be done (I'm not familiar with ruby parsers). Check http://stackoverflow.com/questions/19451326/parse-ruby-code for example – Mariano Jul 28 '16 at 07:55
  • Oh wow...never knew about parsers. Interesting. I just found an interesting gem - https://github.com/whitequark/parser. Thanks for the hat tip. Either way, I love your RegEx. It looks awesome. – marcamillion Jul 28 '16 at 08:00
  • 1
    @Mariano Go ahead and answer your question, assuming no parser. I have asked another question, that is focused specifically on using the parser - http://stackoverflow.com/questions/38630553/how-do-i-use-the-parser-gem-to-extract-this-code-snippet-i-want -- but I really like your answer here so I would love to document it properly. – marcamillion Jul 28 '16 at 08:09
  • @Mariano Awesome. Question updated! – marcamillion Jul 28 '16 at 09:14

1 Answers1

2

As discussed, this should be done with a parser like Ripper.


However, to answer if it can be done with string methods, I will match the syntax with a regex, provided:

  • You can rely on indentation i.e. the string has the exact same characters before "def" and before "end".
  • There are no multiline strings in between that could simulate an "end" with the same indentation. That includes multine strings, HEREDOC, %{ }, etc.

Code

regex = /^
        (\s*)              # matches the indentation (we'll backreference later)
        def\ +has_many\b   # literal "def has_many" with a word boundary
        (?:.*+\n)*?        # match whole lines - as few as possible
        \1                 # matches the same indentation as the def line
        end\b              # literal "end"
        /x

subject = %q|
  def has_many(name, scope = nil, options = {}, &extension)
      if association.nil?
        instance_variable_set("@#{association_name}", association)
      end
  end|


#Print matched text
puts subject.to_enum(:scan,regex).map {$&}

ideone demo


The regex relies on:

  1. Capturing the whitespace (indentation) with the group (\s*),
  2. followed by the literal def has_many.
  3. It then consumes as few lines as it can with (?:.*+\n)*?.
    Notice that .*+\n matches a whole line
    and (?:..)*? repeats it 0 or more times. Also, the last ? makes the repetition lazy (as few as possible).
    It will consume lines until it matches the following condition...
  4. \1 is a backreference, storing the text matched in (1), i.e. the exact same indentation as the first line.
  5. Followed by end obviously.


Test in Rubular

Mariano
  • 6,423
  • 4
  • 31
  • 47