0

I'm looking for a way to find the value and index of the first occurrence of the largest element in an array (in ruby). I have not been able to find a way to achieve this with fewer than 2 lines of code and without traversing the array more than I would like.

First I tried:

example = [1, 100, 2, 100, 3]
value, index = example.each_with_index.max

But this gives me the index of the last occurrence of the largest element.

My current approach is:

example = [1, 100, 2, 100, 3]
value = example.max
index = example.find_index(value)

Which works, but if the first and last occurrences of the largest element are the last two elements of the array, my current code has to traverse pretty much the entire array twice.

I'm more interested in performance than code size, but if there is a non-wasteful one-line solution that would be great!

Pend
  • 662
  • 7
  • 12

2 Answers2

3

Here is a one-pass solution, but I would be surprised if it were faster than the OP’s two-pass solution, considering that Array#max and Array#find_index are both implemented in C and are optimized for arrays (as opposed to their Enumerable counterparts).

 value, index = example.each_with_index.max_by { |n,i| [n,-i] }
  #=> [100, 1]

When you tried example.each_with_index.max, you first computed

enum = example.each_with_index
  #=> #<Enumerator: [1, 100, 2, 100, 3]:each_with_index>

We can convert this enumerator to an array to see what elements it will generate.

enum.to_a
  #=> [[1, 0], [100, 1], [2, 2], [100, 3], [3, 4]]

When Array#max is sent to enum it orders the elements generated by enum, largest to smallest, as follows.

[[100, 3], [100, 1], [3, 4], [2, 2], [1, 0]]

The reason for this ordering is explained in the third paragraph in the doc for Array#<=>.

That's why you obtained

enum.max
  #=> [100, 3]

By contrast, I have Enumerable#max_by searching for the largest value in

a = [[1, 0], [100, -1], [2, -2], [100, -3], [3, -4]]

which is

a.max
  #=> [100, -1]

so [100, 1] is returned.

Cary Swoveland
  • 106,649
  • 6
  • 63
  • 100
  • Why the difference between the `max` and `max_by`? – Sagar Pandya Dec 07 '17 at 03:18
  • @SagarPandya Do you mean **what is** the difference? – sawa Dec 07 '17 at 03:26
  • @sawa I mean, why does `max` return the last occurrence whereas `max_by(&:first)` returns the first. – Sagar Pandya Dec 07 '17 at 03:30
  • @SagarPandya Because, with `max`, when the first element is the same, the comparison depends on the second element. That is how comparison of arrays works. `max_by(&:first)` states to take only the first element into consideration. – sawa Dec 07 '17 at 03:32
  • Pretty sure I'm going to mark this as the answer just as soon as I understand what's happening under the hood. Am I right in understanding this is basically creating an array of element-index pairs, sorting them and returning the first? – Pend Dec 07 '17 at 03:34
  • Hold the phone. I will need to revise this. There is no guarantee the `max_by` will return the first maximum element in an array. – Cary Swoveland Dec 07 '17 at 03:39
  • @CarySwoveland You mean `max_by` is not stable? Then you can perhaps try `group_by`. – sawa Dec 07 '17 at 03:40
  • @sawa, I mean that `example.each_with_index.max_by(&:first)` is not guaranteed to return `[100,1]` rather than `[100,3]`, just because the former is generated first by the enumerator. See the selected answer [here](https://stackoverflow.com/questions/46286715/how-does-rubys-max-function-order-duplicates). I changed my answer to avoid that problem. – Cary Swoveland Dec 07 '17 at 03:47
  • 1
    @StefanPochmann, I read the question as asking for a one-pass solution, not a one-pass solution that is faster than the OP`s two-pass solution. I made an edit to clarify that. The edit reflects my thinking at the time. I am not the least surprised by the results of your benchmark. – Cary Swoveland Dec 07 '17 at 16:08
  • Maybe. But to me it looks like their goal is speed. And that they're just one of those people who think that one-pass is faster than two-pass. An XY problem where they told us both X and Y. Question: `example.each_with_index.max` is also "optimized C", right? That takes about 25 seconds in my test, almost as slow as yours, and much slower than mine with Ruby code (which takes 4.8 seconds). So I think to a large degree the slowness is the "fault" of the extra enumerator(s?). – Stefan Pochmann Dec 07 '17 at 19:29
  • You're right, I did assume a one pass would be faster than a two-pass in this case, so this has been a good lesson on checking my assumptions and more thoroughly benchmarking. Also to not let my assumptions color my questions in the future. Additionally, I learned much more than I previously knew about Enumerable#max_by. Thank-you both – Pend Dec 07 '17 at 23:46
1

I'm more interested in performance

You really should measure then. Your own solution is about 75 times faster than Cary's solution (who also really should've measured).

              user     system      total        real
Yours     0.484000   0.000000   0.484000 (  0.485868)
Cary's   36.094000   0.000000  36.094000 ( 36.412679)

Yes, your solution traverses the array twice. But it's two fast traversals. Cary only has one traversal, but it's a slow one. He's doing much much more per element, an enumerator gets involved, and he's partially doing it in Ruby code.

My benchmark code:

require 'benchmark'

example = (1..10**8).to_a
Benchmark.bm(7) do |x|
  x.report("Yours") {
    value = example.max
    index = example.index(value)
  }
  x.report("Cary's") {
    value, index = example.each_with_index.max_by { |n,i| [n,-i] }
  }
end

I btw changed your example.find_index to example.index. Same thing. Just shorter name.


The fastest single-pass solution (if you for some reason insist on one) I came up with is this:

value, index = example[0], 0
i = 0
example.each do |v|
  if v > value
    value = v
    index = i
  end
  i += 1
end

That took about 4.8 seconds, so about 7.5 times faster than Cary's but still about 10 times slower than yours.


Bottom line: Use your own solution. You can even make it a short one-liner (but I wouldn't):

index = example.index(value = example.max)
Stefan Pochmann
  • 27,593
  • 8
  • 44
  • 107
  • Good point on thorough bench-marking. I did compare the run-times with the data I'm working on, and I didn't see a difference that I thought was significant (I assumed the very tiny difference I did sometimes see would be offset by greater efficiency on larger datasets). But it is obvious in your benchmarks, and when I dramatically crank up the size of the data I'm working on, that the 2 pass solution is faster. Thanks for catching this! – Pend Dec 07 '17 at 23:29
  • @Pending Question: What Ruby version are you using? I just remembered that Ruby 2.4 introduced Array#max which made it *a lot* faster (than using the mixed in Enumerable#max). I'm using Ruby 2.4.1. For [1,2,3] your two-pass is "only" about 14 times faster than Cary's one-pass for me (measuring doing it ten million times). If you're using a Ruby version under 2.4 then that might explain similar speed of both solutions on such small arrays. – Stefan Pochmann Dec 08 '17 at 09:06