17

I want to get the index as well as the results of a scan

"abab".scan(/a/)

I would like to have not only

=> ["a", "a"]

but also the index of those matches

[1, 3]

any suggestion?

Jørgen R
  • 10,568
  • 7
  • 42
  • 59
adn
  • 897
  • 3
  • 22
  • 49
  • Hi - sorry for massive spam but http://area51.stackexchange.com/proposals/74083/korean-language could use you if you're not there already! – Нет войне May 23 '16 at 18:39

4 Answers4

28

Try this:

res = []
"abab".scan(/a/) do |c|
  res << [c, $~.offset(0)[0]]
end

res.inspect # => [["a", 0], ["a", 2]]
Todd Yandell
  • 14,656
  • 2
  • 50
  • 37
  • 19
    @Todd's answer is right. However if you prefer to avoid using the slightly cryptic special variables like `$~` then `Regexp.last_match` is equivalent. i.e. you can say `Regexp.last_match.offset(0)[0]` – mikej Aug 19 '10 at 13:53
  • 9
    or even `Regexp.last_match.offset(0).first` – John La Rooy Aug 19 '10 at 21:41
  • 6
    For those wondering how these methods work, see [`MatchData#offset`](http://ruby-doc.org/core-2.1.1/MatchData.html#method-i-offset) and [`Regexp::last_match`](http://ruby-doc.org/core-2.1.1/Regexp.html#method-c-last_match) – sameers Oct 25 '15 at 18:37
6

There's a gotcha to look out for here, depending on the behaviour you expect.

If you search for /dad/ in "dadad" you'd only get [["dad",0]] because scan advances to the end of each match when it finds one (which is wrong to me).

I came up with this alternative:

def scan_str(str, pattern)
  res = []
  (0..str.length).each do |i|
    res << [Regexp.last_match.to_s, i] if str[i..-1] =~ /^#{pattern}/
  end
  res
end

If you wanted you could also do a similar thing with StringScanner from the standard library, it might be faster for long strings.

jim
  • 1,025
  • 12
  • 17
4

Very similar to what @jim has said and works a bit better for longer strings:

def matches str, pattern
    arr = []
    while (str && (m = str.match pattern))      
        offset = m.offset(0).first 
        arr << offset + (arr[-1] ? arr[-1] + 1 : 0)
        str = str[(offset + 1)..-1]
    end
    arr
end
user81269
  • 629
  • 6
  • 13
1

It surprised me that there isn't any method similar to String#scan which would return array of MatchData objects, similar to String#match. So, if you like monkey-patching, you can combine this with Todd's solution (Enumerator is introduced in 1.9):

class Regexp
  def scan str
    Enumerator.new do |y|
      str.scan(self) do
        y << Regexp.last_match
      end
    end
  end
end
#=> nil
/a/.scan('abab').map{|m| m.offset(0)[0]}
#=> [0, 2]
Mladen Jablanović
  • 43,461
  • 10
  • 90
  • 113