Should melding/merging of binomial heaps be done in one pass or two?

Question

Okasaki's implementation in Purely Functional Data Structures (page 22) does it in two: one to merge the forest, and one to propagate the carries. This strikes me as harder to analyze, and also probably slower, than a one-pass version. Am I missing something?

Okasaki's implementation:

functor BinomialHeap (Element:ORDERED):HEAP=
struct
  structure Elem=Element
  datatype Tree = Node of int*Elem.T*Tree list
  type Heap = Tree list
  fun link (t1 as Node (r,x1,c1), t2 as Node (_,x2,c2))=
    if Elem.leq(x1,x2)
    then Node (r+1,x1,t2::c1)
    else Node (r+1,x2,t1::c2)
  fun insTree (t,[])=[t]
     |insTree (t,ts as t'::ts')=
        if rank t < rank t' then t::ts else insTree(link(t,t'),ts')
  fun insert (x,ts)=insTree(Node(0,x,[]),ts) (*just for reference*)
  fun merge (ts1,[])=ts1
     |merge ([],ts2)=ts2
     |merge (ts1 as t1::ts1', ts2 as t2:ts2')=
        if rank t1 < rank t2 then t1::merge(ts1',ts2)
        else if rank t2 < rank t1 then t2::merge(ts1,ts2')
        else insTree (link(t1,t2), merge (ts1',ts2'))
end

This strikes me as hard to analyze because you have to prove an upper bound on the cost of propagating all the carries (see below). The top-down merge implementation I came up with is much more obviously O(log n) where n is the size of the larger heap:

functor BinomialHeap (Element:ORDERED):HEAP=
struct
  structure Elem=Element
  datatype Tree = Node of int*Elem.T*Tree list
  type Heap = Tree list
  fun rank (Node(r,_,_))=r
  fun link (t1 as Node (r,x1,c1), t2 as Node (_,x2,c2))=
    if Elem.leq(x1,x2)
    then Node (r+1,x1,t2::c1)
    else Node (r+1,x2,t1::c2)
  fun insTree (t,[])=[t]
     |insTree (t,ts as t'::ts')=
        if rank t < rank t' then t::ts else insTree(link(t,t'),ts')
  fun insert (x,ts)=insTree(Node(0,x,[]),ts)

  fun merge(ts1,[])=ts1
     |merge([],ts2)=ts2
     |merge (ts1 as t1::ts1', ts2 as t2::ts2')=
        if rank t1 < rank t2 then t1::merge(ts1',ts2)
        else if rank t2 < rank t1 then t2::merge(ts1,ts2')
        else mwc(link(t1,t2),ts1',ts2')
  (*mwc=merge with carry*)
  and mwc (c,ts1,[])=insTree(c,ts1)
     |mwc (c,[],ts2)=insTree(c,ts2)
     |mwc (c,ts1 as t1::ts1', ts2 as t2::ts2')=
        if rank c < rank t1
        then if rank c < rank t2 then c::merge(ts1,ts2)
                  else mwc(link(c,t2),ts1,ts2')
        else mwc(link(c,t1),ts1',ts2)
end

Proof that Okasaki's implementation is O(log n): if a carry is "expensive" (requires one or more links), then it produces a zero. Therefore the next expensive carry will stop when it reaches that point. So the total number of links required to propagate all the carries is no more than around the total length of the binary representation before propagation, which is bounded above by ceil(log n), where n is the size of the larger heap.

I don't think it makes much of a difference either way. mwc is a manually fused combination of insTree and merge. Without having run any benchmarks, it's probably slightly faster (in actual time, not in big-O), but I'd be surprised if it was enough faster to make any difference to an application. If that intuition is correct, then it boils down to personal preference -- (slightly) simpler code vs (slightly) simpler analysis. — Chris Okasaki, Jul 14 '12 at 15:46
Thanks, @ChrisOkasaki. I actually didn't come up with mwc by thinking of it as combining insTree with merge. Rather, I was translating the usual by-hand algorithm for binary addition, adding bit by bit and handling carrying along the way--that was just the most obvious way for me to think about it. I first thought to have a helper function taking a Node option for the carry, but then realized it would be better to split it in two. — dfeuer, Jul 15 '12 at 00:12
@ChrisOkasaki you should post your comment as an answer! It's great to see you answering questions on SO, btw. — sclv, Feb 05 '13 at 01:39

Should melding/merging of binomial heaps be done in one pass or two?

0 Answers0