10

I have a sorted unique array and want to efficiently insert an element into it that is not in the array like this:

a = [1,2,4,5,6]
new_elm = 3
insert_at = a.bsearch_index {|x| x > new_elm } # => 2
a.insert(insert_at, new_elm) # now a = [1,2,3,4,5,6]

The method bsearch_index doesn't exist: only bsearch, which returns the matching element rather than the matching element's index. Is there any built in way to accomplish this?

alk
  • 69,737
  • 10
  • 105
  • 255
Jonah
  • 15,806
  • 22
  • 87
  • 161
  • possible duplicate of [Is there a built-in binary-search In Ruby?](http://stackoverflow.com/questions/8672472/is-there-a-built-in-binary-search-in-ruby) – ArthurChamz May 05 '14 at 21:02
  • Wait a minute, it's you! Is it the same question or did I understand it wrong? – ArthurChamz May 05 '14 at 21:05
  • It is me! Except if you read the two questions carefully you will see they are not at all the same. – Jonah May 05 '14 at 21:36

6 Answers6

10

You can use the Enumerator object returned by each_with_index to return a nested array of [value, index] pairs and then conduct your binary search on that array:

a = [1,2,4,5,6]
new_elm = 3

index = [*a.each_with_index].bsearch{|x, _| x > new_elm}.last
=> 2

a.insert(index, new_elm)

EDIT:

I've run some simple benchmarks in response to your question with an array of length 1e6 - 1:

require 'benchmark'

def binary_insert(a,e)
  index = [*a.each_with_index].bsearch{|x, _| x > e}.last
  a.insert(index, e)
end

a = *1..1e6
b = a.delete_at(1e5)
=> 100001

Benchmark.measure{binary_insert(a,b)}
=> #<Benchmark::Tms:0x007fd3883133d8 @label="", @real=0.37332, @cstime=0.0, @cutime=0.0, @stime=0.029999999999999805, @utime=0.240000000000002, @total=0.2700000000000018> 

With this in mind, you might consider trying using a heap or a trie instead of an array to store your values. Heaps in particular have constant insertion and removal time complexities, making them ideal for large storage applications. Check out this article here: Ruby algorithms: sorting, trie, and heaps

jwfearn
  • 28,781
  • 28
  • 95
  • 122
wvandaal
  • 4,265
  • 2
  • 16
  • 27
  • Interesting idea. Do you know how the performance will be? I'd be concerned that the Enumerator to array conversion would be slow for large arrays, especially if I'm using it within a loop, but I'm only guessing at how it works internally.... – Jonah May 05 '14 at 21:54
9

How about using SortedSet?:

require 'set'

a = SortedSet.new [1,2,4,5,6]
new_elm = 3
a << new_elm # now a = #<SortedSet: {1, 2, 3, 4, 5, 6}>

SortedSet is implemented using rbtree. I've made the following benchmark:

def test_sorted(max_idx)
  arr_1 = (0..max_idx).to_a
  new_elm = arr_1.delete(arr_1.sample)
  arr_2 = arr_1.dup
  set_1 = SortedSet.new(arr_1)
  Benchmark.bm do |x|
    x.report { arr_1.insert(arr_1.index { |x| x > new_elm }) }
    x.report { arr_2.insert([*arr_2.each_with_index].bsearch{|x, _| x > new_elm}.last) }
    x.report { set_1 << new_elm }
  end
end

With the following results:

test_sorted 10_000
# =>       user     system      total        real
# =>   0.000000   0.000000   0.000000 (  0.000900)
# =>   0.010000   0.000000   0.010000 (  0.001868)
# =>   0.000000   0.000000   0.000000 (  0.000007)

test_sorted 100_000
# =>       user     system      total        real
# =>   0.000000   0.000000   0.000000 (  0.001150)
# =>   0.000000   0.010000   0.010000 (  0.048040)
# =>   0.000000   0.000000   0.000000 (  0.000013)

test_sorted 1_000_000
# =>       user     system      total        real
# =>   0.040000   0.000000   0.040000 (  0.062719)
# =>   0.280000   0.000000   0.280000 (  0.356032)
# =>   0.000000   0.000000   0.000000 (  0.000012)
Uri Agassi
  • 36,848
  • 14
  • 76
  • 93
7

"The method bsearch_index doesn't exist": Ruby 2.3 introduces bsearch_index. (Kudos for getting the method name right before it existed).

steenslag
  • 79,051
  • 16
  • 138
  • 171
6

Ruby 2.3.1 introduced bsearch_index thus the question can now be resolved this way:

a = [1,2,4,5,6]
new_elm = 3

index = a.bsearch_index{|x, _| x > new_elm}
=> 2

a.insert(index, new_elm)
eXa
  • 618
  • 8
  • 18
3

Try this

(0...a.size).bsearch { |n| a[n] > new_element }

This uses bsearch defined on Range to search the array and thus returns the index.


Performance will be way better than each_with_index which materializes O(n) temporary array tuples and thus clogs up garbage collection.

akuhn
  • 27,477
  • 2
  • 76
  • 91
-2

The index method accepts a block and will return the first index where the block is true

a = [1,2,4,5,6] 
new_elem = 3
insert_at = a.index{|b| b > new_elem}
#=> 2
a.insert(insert_at, new_elm) 
#=>[1,2,3,4,5,6]
engineersmnky
  • 25,495
  • 2
  • 36
  • 52
  • This doesn't answer the question since index does not use binary search internally -- that is the key to the question. – Jonah May 05 '14 at 21:39
  • @jonah why binary search you have no reasoning stated for this so I gave a fully viable option based on the example – engineersmnky May 06 '14 at 01:42
  • 1
    i specifically ask for bsearch, but returning an index. that *was* the question. i specifically mention efficiency. all you did was take my example and substitute `index` for `bsearch_index`. obviously i knew the `index` method exists, so sorry, your answer is not a "fully viable" option. – Jonah May 06 '14 at 02:21
  • @jonah for your example it is viable as far as efficiency goes have you even tested it. It is far more efficient than chaining in bsearch. Look at Uri Agassi's Benchmarks. Sorted Set seems like the most efficient way to go but yes this is viable becuase you think binary is more efficient does not make it true. – engineersmnky May 06 '14 at 13:12