5

I am trying to group all the repeated letters in a string.

Eg:

"aaaaaaabbbbbbbbc" => [['aaaaaaa'],['bbbbbbbb'],['c']]

Using logic and Ruby, the only way I could find to reach my intention was:

.scan(/(?:a+|A+)|(?:b+|B+)|(?:c+|C+)| ..... (?:y+|Y+)|(?:z+|Z+))

where ... are the other alphabet letters.

There is a way to Dry that RegEx? I used backtrace (\1) too, but it doesn't match the single words and it doesn't return me the exact letters match => (\w+)\1 => [['aa'],['bb']]

Uhm, am I wrong to use the regular expressions for this case and I should use Ruby methods with iterations?

I will glad to hear your opinion :) Thanks!

misterwolf
  • 424
  • 4
  • 14

4 Answers4

7

Just use another capturing group to catch the repeated characters.

s.scan(/((\w)\2*)/).map(&:first)
# => ["aaaaaaa", "bbbbbbbb", "c"]
Avinash Raj
  • 172,303
  • 28
  • 230
  • 274
  • 1
    Interesting! You could instead write `map(&:first)`. The way [String#scan](http://ruby-doc.org/core-2.4.0/String.html#method-i-scan) deals with groups is like that double-edged sword you are holding: it's either convenient or--as here--an irritating impediment. – Cary Swoveland Oct 28 '17 at 21:18
  • @Cary Thanks as always :-) – Avinash Raj Oct 29 '17 at 11:01
  • I like this solution, but I also I want to understand it. Could someone put into words how the regexp is set up, especially '\2*' ? Thanks! – krystonen Dec 11 '18 at 11:19
  • 1
    @Krisztina in regex, we use `\` to refer the corresponding captured chars, say `\1` refers to the chars which are capturing by the first capturing group `()` whereas the `\2` refers to the 2nd capturing group. Likewise it goes on. So here `(\w)` should capture all the word chars and `(\w)\1*` matches all word chars as well as the following 0 or more repeated chars. say it matches `b`, `nn`, `b` in `bnnb` input string. Since the `scan` func returns only the captured chars, I again wrap the whole regex inside another group. – Avinash Raj Dec 11 '18 at 15:07
  • 1
    But if we use the `\1`, it still refers to the first group but we want the second one that's why I used `\2` – Avinash Raj Dec 11 '18 at 15:08
2

One more solution without regexp :)

"aaaaaaabbbbbbbbc".chars.group_by(&:itself).values.map { |e| [e.join] }
 #=> [["aaaaaaa"], ["bbbbbbbb"], ["c"]]
Oleksandr Holubenko
  • 4,310
  • 2
  • 14
  • 28
1

Without using a regex you could take a look to Enumerable#slice_when:

string = "aaaaaaabbbbbbbbc"
p string.chars.sort.slice_when { |a, b| a != b }.map { |element| element.join.split }
# [["aaaaaaaa"], ["bbbbbbbb"], ["c"]]
Sebastián Palma
  • 32,692
  • 6
  • 40
  • 59
1

Here are a few other ways ways to do that. All return ["aaaaaaa", "bbbbbbbb", "c"]. If [["aaaaaaa"], ["bbbbbbbb"], ["c"]] is truly wanted (I can't imagine why), that's a simple extra step using map.

s.each_char.chunk(&:itself).map(&:join)

s.each_char.chunk_while { |a,b| b == a }.map(&:join)

s[1..-1].each_char.with_object([s[0]]) {|c,a| c == a.last[0] ? (a.last<<c) : a<< c}

s.gsub(/(.)\1*/).with_object([]) { |t,a| a << t }

In the last of these, String#gsub does not have a block, so it returns an enumerator (and does not perform any character replacement.) This use of gsub can be used to advantage in many situations.

Cary Swoveland
  • 106,649
  • 6
  • 63
  • 100