37

I'd like to match a regex and get the position in the string of the match

For example,

"AustinTexasDallasTexas".match_with_posn /(Texas)/

I'd like match_with_posn to return something like: [6, 17] where 6 and 17 are the start positions for both instances of the word Texas.

Is there anything like this?

Austin Richardson
  • 8,078
  • 13
  • 43
  • 49
  • possible duplicate of [How to get indexes of all occurrences of a pattern in a string](http://stackoverflow.com/questions/4274388/how-to-get-indexes-of-all-occurrences-of-a-pattern-in-a-string) – Nakilon Sep 11 '15 at 12:51

3 Answers3

62

Using Ruby 1.8.6+, you can do this:

require 'enumerator' #Only for 1.8.6, newer versions should not need this.

s = "AustinTexasDallasTexas"
positions = s.enum_for(:scan, /Texas/).map { Regexp.last_match.begin(0) }

This will create an array with:

=> [6, 17]
Sean Hill
  • 14,978
  • 2
  • 50
  • 56
  • If you want to find atea in Isateateatest it will return [2], but 5 is also a possibility – adc Dec 26 '14 at 17:18
  • 2
    The "a" in index 5 is used to match the "atea" found at index 2. If you search for "ate", you will get an array of `[2, 5, 8]`. If you want to find overlapping matches, then use a lookahead assertion: `/(?=(atea))/`. `positions = s.enum_for(:scan, /(?=(atea))/).map { Regexp.last_match.begin(0) } #=> [2, 5]` – Sean Hill Dec 26 '14 at 17:47
  • Can the person who down voted this please explain the down vote? – Sean Hill Mar 03 '15 at 20:25
  • Can you explain this in details. – Subha Feb 08 '18 at 06:14
  • 1
    It's returning an enumerator for `scan`, which finds the matches in a string for the argument passed to it, in this case, `/Texas/`. Without the enumerator, it would normally return the part of the string that matched. Since we are using the enumerator, we can map over the matches in such a way that we can return the index for each `scan` result. What essentially happens is that each step in the `map` call calls `next` on the enumerator returned by `enum_for` and then returns the value of what is inside the block. – Sean Hill Feb 13 '18 at 04:43
31

Sort of, see String#index

"AustinTexasDallasTexas".index /Texas/
=> 6

Now, you could extend the String API.

class String
  def indices e
    start, result = -1, []
    result << start while start = (self.index e, start + 1)
    result
  end
end
p "AustinTexasDallasTexas".indices /Texas/
=> [6, 17]
DigitalRoss
  • 143,651
  • 25
  • 248
  • 329
  • 1
    Suppose the string were `“aaaa”` and `e` were `”aa”`. The question is unclear as to whether the desired return value is `[0,1,2]` or `[0,2]`. You return the former. To return the latter make `index`’s second argument `start+e.size` and initialize `start` to `-e.size`. No need for `self.`. – Cary Swoveland Aug 21 '18 at 14:46
3
"AustinTexasDallasTexas".gsub(/Texas/).map { Regexp.last_match.begin(0) }
  #=> [6, 17]
Cary Swoveland
  • 106,649
  • 6
  • 63
  • 100