Regexp.last_match - why is it useful?

Question

In "Comprehensive Ruby programming course' e-book, I faced a chapter, where author (Jordan Hudgens) describes it like:

“The last thing we are going to try is to return all the integer values from the sentence.”

And he does it like this:

string = "The quick 12 brown foxes jumped over 10 lazy dogs"
p string.to_enum(:scan, /\d+/).map { Regexp.last_match }

And it returns:

=> [#<MatchData "3">, #<MatchData "34">, #<MatchData "23">]

I wonder why / when, this Regexp.last_match might be used or better to say - why this way is not more efficient than:

p string.to_enum(:scan, /\d+/).map { |i| p i }

This outputs only an array of integers and seems for me as a more efficient way to get those numbers..

Anyone maybe could explain what might was the reasons for author to pick Regesp.last_match ?

What you suggest would be *way* better. Global mutable state is ugly. Using global mutable state *and* relying on an implementation detail? Yuck. It’s possible you should pick a different e-book. — Ry-, Apr 02 '17 at 10:12
Thank you for the response. Next thing for me to read is "Global mutable" :) I will delete this post if it will seem as too broad or unanswerable. And about that book - it was, actually, quite nice, just this part was a bit strange... — Julius Dzidzevičius, Apr 02 '17 at 10:14
@Ryan 1. those return _different_ things, 2. `Regexp.last_match` is _not_ global mutable. — Aleksei Matiushkin, Apr 02 '17 at 10:40
@mudasobwa: Is this about thread locals or something else? I’m also not sure how returning different things is a problem if the end goal was to get the integer values. — Ry-, Apr 02 '17 at 10:41
Yes, it’s about thread locals. We have no idea was was the original intent, but this code is used to get an array of `MatchData` instances. — Aleksei Matiushkin, Apr 02 '17 at 10:44
@mudasobwa: Thread-local is a type of global, and remains as awful as I made it out to be. — Ry-, Apr 02 '17 at 10:45
Apparently [this is standard for Ruby](https://stackoverflow.com/questions/6804557/how-do-i-get-the-match-data-for-all-occurrences-of-a-ruby-regular-expression-in?lq=1), which is rather horrifying. — Ry-, Apr 02 '17 at 10:47

Aleksei Matiushkin · Accepted Answer · 2017-04-02T10:59:35.080

2

This is a nifty trick (read: hack).

string = "The quick 12 brown foxes jumped over 10 lazy dogs"
p string.to_enum(:scan, /\d+/).map { Regexp.last_match }

The thing is there is no handy way to yield instances of MatchData from inside String#scan.

p string.to_enum(:scan, /\d+/).map { |i| p i }

makes not much sense, you probably meant:

p string.to_enum(:scan, /\d+/).map(&:itself) # or { |i| i } # or .to_a

or even

p string.scan(/\d+/)

The results differ, though; the latter returns strings, while the former is a way to return MatchData instances.

edited Apr 02 '17 at 10:59

answered Apr 02 '17 at 10:43

Aleksei Matiushkin

119,336
10
100
160

Why would you `.map(&:itself)`? – Ry- Apr 02 '17 at 10:44
@Ryan To make the code to actually execute. `to_enum` returns a lazy enumerator. – Aleksei Matiushkin Apr 02 '17 at 10:45
So a long way of saying `.to_a`? – Ry- Apr 02 '17 at 10:46
Yes; I tried to make as few changes to the OP’s code as possible. – Aleksei Matiushkin Apr 02 '17 at 10:47
@mudasobwa, thanks. As @Justin wrote under one of your other answers, wouldn't this be somehow different? `string.to_enum(:scan, /\d+/).map { |m| p $~ }` ? I totally new to this so don't understand what `$~` means... I could ask another question if you will, just don't even know how to ask it :) – Julius Dzidzevičius Apr 02 '17 at 19:14
1

`$~` is an exact (though cryptic) synonym of `Regexp.last_match`. – Aleksei Matiushkin Apr 03 '17 at 04:04

Eric Duminil · Answer 2 · 2017-04-02T11:23:54.713

2

Here's a more verbose but possibly cleaner solution if you want an Enumerator of MatchData instances :

class String
  def matches(regex)
    position = 0
    Enumerator.new do |yielder|
      while match = regex.match(self, position)
        yielder << match
        position = match.end(0)
      end
    end
  end
end

string = 'The quick 12 brown foxes jumped over 10 lazy dogs'
p string.matches(/\d+/).to_a
# [#<MatchData "12">, #<MatchData "10">]
p (2**1000000).to_s.matches(/(\d)\1{5}/).first(2)
# [#<MatchData "444444" 1:"4">, #<MatchData "888888" 1:"8">]

If you don't want to monkey patch String, you could define this method in Regex or as a stand-alone method with string and regex as parameters.

edited Apr 02 '17 at 11:23

answered Apr 02 '17 at 11:09

Eric Duminil

52,989
9
71
124

Thanks @Eric, but my meta skills is about at 0 or -1, so I will better wait for anyone more experienced to judge it :) p.s. Monkey patch means meta programming (overriding software in this case)? :) – Julius Dzidzevičius Apr 02 '17 at 19:20
1

@J.D. : Monkey-patching means opening an existing class and modifying it, e.g. by defining a new method. This code isn't really "meta", and is closer to what you would write in Java or Python. – Eric Duminil Apr 02 '17 at 20:06

Regexp.last_match - why is it useful?

2 Answers2

Linked