1

In "Comprehensive Ruby programming course' e-book, I faced a chapter, where author (Jordan Hudgens) describes it like:

“The last thing we are going to try is to return all the integer values from the sentence.”

And he does it like this:

string = "The quick 12 brown foxes jumped over 10 lazy dogs"
p string.to_enum(:scan, /\d+/).map { Regexp.last_match }

And it returns:

=> [#<MatchData "3">, #<MatchData "34">, #<MatchData "23">]

I wonder why / when, this Regexp.last_match might be used or better to say - why this way is not more efficient than:

p string.to_enum(:scan, /\d+/).map { |i| p i } 

This outputs only an array of integers and seems for me as a more efficient way to get those numbers..

Anyone maybe could explain what might was the reasons for author to pick Regesp.last_match ?

Eric Duminil
  • 52,989
  • 9
  • 71
  • 124
Julius Dzidzevičius
  • 10,775
  • 11
  • 36
  • 81
  • 1
    What you suggest would be *way* better. Global mutable state is ugly. Using global mutable state *and* relying on an implementation detail? Yuck. It’s possible you should pick a different e-book. – Ry- Apr 02 '17 at 10:12
  • Thank you for the response. Next thing for me to read is "Global mutable" :) I will delete this post if it will seem as too broad or unanswerable. And about that book - it was, actually, quite nice, just this part was a bit strange... – Julius Dzidzevičius Apr 02 '17 at 10:14
  • 1
    @Ryan 1. those return _different_ things, 2. `Regexp.last_match` is _not_ global mutable. – Aleksei Matiushkin Apr 02 '17 at 10:40
  • @mudasobwa: Is this about thread locals or something else? I’m also not sure how returning different things is a problem if the end goal was to get the integer values. – Ry- Apr 02 '17 at 10:41
  • 1
    Also, you could just use `string.scan(/\d+/)`. – Ry- Apr 02 '17 at 10:44
  • Yes, it’s about thread locals. We have no idea was was the original intent, but this code is used to get an array of `MatchData` instances. – Aleksei Matiushkin Apr 02 '17 at 10:44
  • @mudasobwa: Thread-local is a type of global, and remains as awful as I made it out to be. – Ry- Apr 02 '17 at 10:45
  • Apparently [this is standard for Ruby](https://stackoverflow.com/questions/6804557/how-do-i-get-the-match-data-for-all-occurrences-of-a-ruby-regular-expression-in?lq=1), which is rather horrifying. – Ry- Apr 02 '17 at 10:47

2 Answers2

2

This is a nifty trick (read: hack).

string = "The quick 12 brown foxes jumped over 10 lazy dogs"
p string.to_enum(:scan, /\d+/).map { Regexp.last_match }

The thing is there is no handy way to yield instances of MatchData from inside String#scan.

p string.to_enum(:scan, /\d+/).map { |i| p i } 

makes not much sense, you probably meant:

p string.to_enum(:scan, /\d+/).map(&:itself) # or { |i| i } # or .to_a

or even

p string.scan(/\d+/) 

The results differ, though; the latter returns strings, while the former is a way to return MatchData instances.

Aleksei Matiushkin
  • 119,336
  • 10
  • 100
  • 160
2

Here's a more verbose but possibly cleaner solution if you want an Enumerator of MatchData instances :

class String
  def matches(regex)
    position = 0
    Enumerator.new do |yielder|
      while match = regex.match(self, position)
        yielder << match
        position = match.end(0)
      end
    end
  end
end

string = 'The quick 12 brown foxes jumped over 10 lazy dogs'
p string.matches(/\d+/).to_a
# [#<MatchData "12">, #<MatchData "10">]
p (2**1000000).to_s.matches(/(\d)\1{5}/).first(2)
# [#<MatchData "444444" 1:"4">, #<MatchData "888888" 1:"8">]

If you don't want to monkey patch String, you could define this method in Regex or as a stand-alone method with string and regex as parameters.

Eric Duminil
  • 52,989
  • 9
  • 71
  • 124
  • Thanks @Eric, but my meta skills is about at 0 or -1, so I will better wait for anyone more experienced to judge it :) p.s. Monkey patch means meta programming (overriding software in this case)? :) – Julius Dzidzevičius Apr 02 '17 at 19:20
  • 1
    @J.D. : Monkey-patching means opening an existing class and modifying it, e.g. by defining a new method. This code isn't really "meta", and is closer to what you would write in Java or Python. – Eric Duminil Apr 02 '17 at 20:06