0

I need to take out the longest substring of uppercased characters. So out of the string:

"aaBBBBcBBdDDD"

I need to get "BBBB".

Is there a convenient Ruby method for that or a regexp of some kind? I tried:

string.scan(/[[:upper:]]/)

and that's almost it, only it gives ALL capital characters, not the longest sequence.

the Tin Man
  • 158,662
  • 42
  • 215
  • 303
Leo
  • 2,061
  • 4
  • 30
  • 58
  • 1
    That's not something regex can do. (At the very least, it's not something regex is designed to do.) – Aran-Fey Dec 29 '14 at 13:31
  • As @Rawing said, that's not what Regexp are for. Regular expressions are great at finding things that *look* like something, but they're terrible at finding something that is exactly something when presented with multiple choices, especially if there is any wiggle room. – the Tin Man Dec 29 '14 at 21:00

4 Answers4

8

Use regex to get an array of uppercase words, then use Enumerable#max_by to find the longest:

"aaBBBBcBBdDDD".scan(/[[:upper:]]+/).max_by {|x| x.length}
# => "BBBB"

or simpler:

"aaBBBBcBBdDDD".scan(/[[:upper:]]+/).max_by(&:length)
# => "BBBB"
Yu Hao
  • 119,891
  • 44
  • 235
  • 294
1

You can't find the string which has the maximum length only through regex. You need to use some built-in Ruby functions.

> m = "aaBBBBcBBdDDD".scan(/[[:upper:]]+/)
=> ["BBBB", "BB", "DDD"]
> vc = m.sort{|a,b| b.size <=> a.size}
=> ["BBBB", "DDD", "BB"]
> vc.delete_if{|a| a.size < vc.first.size}
=> ["BBBB"]
the Tin Man
  • 158,662
  • 42
  • 215
  • 303
Avinash Raj
  • 172,303
  • 28
  • 230
  • 274
  • yep, `+` matches the previous token one or more times. – Avinash Raj Dec 29 '14 at 13:38
  • What if the string looks like this: `aaBBBBAABcBBdDDD` ? It would return `BBBBAAB` – hek2mgl Dec 29 '14 at 13:38
  • yep , it returns `BBBBAAB` – Avinash Raj Dec 29 '14 at 13:40
  • I though sequences of *identical* uppercased characters should be returned. – hek2mgl Dec 29 '14 at 13:45
  • Don't use `sort{|a,b| b.size <=> a.size}`. `sort_by` would be faster because it remembers the intermediate value, instead of `sort` having to compute the `size` of `a` and `b` each pass through the loop. – the Tin Man Dec 29 '14 at 19:22
  • @theTinMan so you are suggesting `.sort_by{|a| -a.length}` Or `.sort_by(&:length).reverse)`? Or are you suggesting changing the last line to be `.delete_if{|a| a.length < vc.last.length}`. When benchmarking `sort_by` seems to be slower than `sort` in this case. Also in regards to this method right now it is determining the max on each loop through `delete_if` by referencing itself. I don't think this seems like the best approach. – engineersmnky Dec 29 '14 at 20:37
  • I'd suggest reading "[Sorting an array in descending order in Ruby](http://stackoverflow.com/questions/2642182/sorting-an-array-in-descending-order-in-ruby/2651028#2651028)". – the Tin Man Dec 29 '14 at 20:55
  • Using `delete_if` isn't the best choice. I'd use `max_by` personally, but others MMV. – the Tin Man Dec 29 '14 at 20:57
0
([A-Z]+)

Try this.Capture all the groups and the one with maximum length is your answer.See demo.

https://regex101.com/r/gX5qF3/11

vks
  • 67,027
  • 10
  • 91
  • 124
0

You did not specify the expected result for more than 1 string of the same max length.

@AvinashRaj's answer will handle this while @YuHao's will not. If you want only 1 result I would suggest @YuHao's answer if you want all the results I would change @AvinashRaj's answer to something like this.

"aaBBBBcBBdDDDD".scan(/[[:upper:]]+/).tap do |a| 
   max_length = a.map(&:length).max
   a.delete_if{|x| x.length < max_length } 
end
#=> ["BBBB","DDDD"]
engineersmnky
  • 25,495
  • 2
  • 36
  • 52
  • I don't think `tap` buys you anything here. It would be clearer, imo, to write: `a = "aaBBBBcBBdDDDD".scan(/[[:upper:]]+/); max_length = a.map(&:length).max; a.select { |s| s.size == max_length }`. – Cary Swoveland Dec 29 '14 at 19:44
  • @CarySwoveland thank you for the response. There are probably a dozen more simply conceived options for this. I just don't see a need for multiple local variables that will not be utilized anywhere else. Benchmarking them shows not significant difference in performance either they are all within a 10% +/- – engineersmnky Dec 29 '14 at 20:31