I want to be able to find the index of all occurrences of a substring in a larger string using Ruby. E.g.: all "in" in "Einstein"
str = "Einstein"
str.index("in") #returns only 1
str.scan("in") #returns ["in","in"]
#desired output would be [1, 6]
The standard hack is:
indices = "Einstein".enum_for(:scan, /(?=in)/).map do
Regexp.last_match.offset(0).first
end
#=> [1, 6]
def indices_of_matches(str, target)
sz = target.size
(0..str.size-sz).select { |i| str[i,sz] == target }
end
indices_of_matches('Einstein', 'in')
#=> [1, 6]
indices_of_matches('nnnn', 'nn')
#=> [0, 1, 2]
The second example reflects an assumption I made about the treatment of overlapping strings. If overlapping strings are not to be considered (i.e., [0, 2]
is the desired return value in the second example), this answer is obviously inappropriate.
This is a more verbose solution which brings the advantage of not relying on a global value:
def indices(string, regex)
position = 0
Enumerator.new do |yielder|
while match = regex.match(string, position)
yielder << match.begin(0)
position = match.end(0)
end
end
end
p indices("Einstein", /in/).to_a
# [1, 6]
It outputs an Enumerator
, so you could also use it lazily or just take the n
first indices.
Also, if you might need more information than just the indices, you could return an Enumerator
of MatchData
and extract the indices:
def matches(string, regex)
position = 0
Enumerator.new do |yielder|
while match = regex.match(string, position)
yielder << match
position = match.end(0)
end
end
end
p matches("Einstein", /in/).map{ |match| match.begin(0) }
# [1, 6]
To get the behaviour described by @Cary, you could replace the last line in block by position = match.begin(0) + 1
.
#Recursive Function
def indexes string, sub_string, start=0
index = string[start..-1].index(sub_string)
return [] unless index
[index+start] + indexes(string,sub_string,index+start+1)
end
#For better Usage I would open String
class
class String
def indexes sub_string,start=0
index = self[start..-1].index(sub_string)
return [] unless index
[index+start] + indexes(sub_string,index+start+1)
end
end
This way we can call in this way: "Einstein".indexes("in") #=> [1, 6]