25

I am trying to return the index's to all occurrences of a specific character in a string using Ruby. A example string is "a#asg#sdfg#d##" and the expected return is [1,5,10,12,13] when searching for # characters. The following code does the job but there must be a simpler way of doing this?

def occurances (line)

  index = 0
  all_index = []

  line.each_byte do |x|
    if x == '#'[0] then
      all_index << index
    end
    index += 1
  end

  all_index
end
Cœur
  • 37,241
  • 25
  • 195
  • 267
Gerhard
  • 6,850
  • 8
  • 51
  • 81

6 Answers6

28
s = "a#asg#sdfg#d##"
a = (0 ... s.length).find_all { |i| s[i,1] == '#' }
FMc
  • 41,963
  • 13
  • 79
  • 132
  • 3
    s = "a#asg#sdfg#d##" a = (0 ... s.length).find_all { |i| s[i] == '#' } should work too right? no need for the ,1 ...? – Sam Joseph May 21 '15 at 18:55
  • @SamJoseph In this case, yes, the two are synonymous. The 2 argument version of `[x, y]` means "a substring of length `y` starting at `x`", which is the same as `[x]`, which means "character at `x` (also a string because ruby doesn't have a Char type)". – Eric Haynes Oct 29 '16 at 19:44
19
require 'enumerator' # Needed in 1.8.6 only
"1#3#a#".enum_for(:scan,/#/).map { Regexp.last_match.begin(0) }
#=> [1, 3, 5]

ETA: This works by creating an Enumerator that uses scan(/#/) as its each method.

scan yields each occurence of the specified pattern (in this case /#/) and inside the block you can call Regexp.last_match to access the MatchData object for the match.

MatchData#begin(0) returns the index where the match begins and since we used map on the enumerator, we get an array of those indices back.

sepp2k
  • 363,768
  • 54
  • 674
  • 675
17

Here's a less-fancy way:

i = -1
all = []
while i = x.index('#',i+1)
  all << i
end
all

In a quick speed test this was about 3.3x faster than FM's find_all method, and about 2.5x faster than sepp2k's enum_for method.

glenn mcdonald
  • 15,290
  • 3
  • 35
  • 40
  • Those speed figures were from 1.8.5. In 1.9.1 this is still fastest by a wide margin, but find_all is about 3x slower and enum_for is about 5x slower! – glenn mcdonald Nov 30 '09 at 14:12
  • My quick guess is that it's `Regexp.last_match.begin(0)` that's slowing down the `enum_for` method. (That is, I hope that `enum_for` itself is not the problem.) Either way, I like that this is both simple and readable. Less fancy is often more good. – Telemachus Nov 30 '09 at 14:55
  • This is faster because a block is executed for every character in the other approaches. I came across and solved a similar question at http://stackoverflow.com/questions/6387428/why-is-counting-letters-faster-using-stringcount-than-using-stringchars-in-ruby/6475413#6475413 – Andrew Grimm Jun 28 '11 at 07:28
3

Here's a long method chain:

"a#asg#sdfg#d##".
  each_char.
  each_with_index.
  inject([]) do |indices, (char, idx)|
    indices << idx if char == "#"
    indices
  end

# => [1, 5, 10, 12, 13]

requires 1.8.7+

Nakilon
  • 34,866
  • 14
  • 107
  • 142
glenn jackman
  • 238,783
  • 38
  • 220
  • 352
  • In 1.9 you can do `.each_char.with_index` (instead of `each_char.each_with_index`). It reads better that way, I think. – Telemachus Nov 30 '09 at 15:00
1

Another solution derived from FMc's answer:

s = "a#asg#sdfg#d##"
q = []
s.length.times {|i| q << i if s[i,1] == '#'}

I love that Ruby never has only one way of doing something!

Gerhard
  • 6,850
  • 8
  • 51
  • 81
1

Here's a solution for massive strings. I'm doing text finds on 4.5MB text strings and the other solutions grind to a halt. This takes advantage of the fact that ruby .split is very efficient compared to string comparisions.

def indices_of_matches(str, target)
      cuts = (str + (target.hash.to_s.gsub(target,''))).split(target)[0..-2]
      indicies = []
      loc = 0
      cuts.each do |cut|
        loc = loc + cut.size
        indicies << loc
        loc = loc + target.size
      end
      return indicies
    end

It's basically using the horsepower behind the .split method, then using the separate parts and the length of the searched string to work out locations. I've gone from 30 seconds using various methods to instantaneous on extremely large strings.

I'm sure there's a better way to do it, but:

(str + (target.hash.to_s.gsub(target,'')))

adds something to the end of the string in case the target is at the end (and the way split works), but have to also make sure that the "random" addition doesn't contain the target itself.

indices_of_matches("a#asg#sdfg#d##","#")
=> [1, 5, 10, 12, 13]
KeiferJ
  • 121
  • 5