14

I want to be able to find the index of all occurrences of a substring in a larger string using Ruby. E.g.: all "in" in "Einstein"

str = "Einstein"
str.index("in") #returns only 1
str.scan("in")  #returns ["in","in"]
#desired output would be [1, 6]
Dimitar
  • 4,402
  • 4
  • 31
  • 47
Mokhtar
  • 165
  • 2
  • 7

4 Answers4

24

The standard hack is:

indices = "Einstein".enum_for(:scan, /(?=in)/).map do
  Regexp.last_match.offset(0).first
end
#=> [1, 6]
tokland
  • 66,169
  • 13
  • 144
  • 170
9
def indices_of_matches(str, target)
  sz = target.size
  (0..str.size-sz).select { |i| str[i,sz] == target }
end

indices_of_matches('Einstein', 'in')
  #=> [1, 6]
indices_of_matches('nnnn', 'nn')
  #=> [0, 1, 2]

The second example reflects an assumption I made about the treatment of overlapping strings. If overlapping strings are not to be considered (i.e., [0, 2] is the desired return value in the second example), this answer is obviously inappropriate.

Cary Swoveland
  • 106,649
  • 6
  • 63
  • 100
6

This is a more verbose solution which brings the advantage of not relying on a global value:

def indices(string, regex)
  position = 0
  Enumerator.new do |yielder|
    while match = regex.match(string, position)
      yielder << match.begin(0)
      position = match.end(0)
    end
  end
end

p indices("Einstein", /in/).to_a
# [1, 6]

It outputs an Enumerator, so you could also use it lazily or just take the n first indices.

Also, if you might need more information than just the indices, you could return an Enumerator of MatchData and extract the indices:

def matches(string, regex)
  position = 0
  Enumerator.new do |yielder|
    while match = regex.match(string, position)
      yielder << match
      position = match.end(0)
    end
  end
end

p matches("Einstein", /in/).map{ |match| match.begin(0) }
# [1, 6]

To get the behaviour described by @Cary, you could replace the last line in block by position = match.begin(0) + 1.

Eric Duminil
  • 52,989
  • 9
  • 71
  • 124
1

#Recursive Function

    def indexes string, sub_string, start=0
      index = string[start..-1].index(sub_string)
      return [] unless index
      [index+start] + indexes(string,sub_string,index+start+1)
    end

#For better Usage I would open String class

  class String

    def indexes sub_string,start=0
      index = self[start..-1].index(sub_string)
      return [] unless index
      [index+start] + indexes(sub_string,index+start+1)
    end

  end

This way we can call in this way: "Einstein".indexes("in") #=> [1, 6]

Qaisar Nadeem
  • 2,404
  • 13
  • 23