1

I've been using a data structure which the original developer called a heap, it's used to implement a priority queue.

While there is a lot written about binary trees, (min/max) heaps seem less well defined (details vary between implementations).

Some characteristics I've noticed that don't necessarily apply to binary-tree structures.

  • The same element can be in the queue multiple times without causing complication in execution or implementation.
  • Searching (while possible and faster than an exhaustive search), is not so efficient (since the child elements of each node don't have to be balanced).
  • Since searching is not efficient and there is a potential for duplicates, removal may require storing a reference to the node, instead of using a key to look-up the node (which is common practice for binary trees).
  • Changing priorities in the heap is trivial, compared to a binary tree where it's most common to delete+insert.
    (with a better best case and a worse worst case compared to binary-trees)

Is there more detailed terminology for data structures that match these characteristics?
Or is it simply a min/max heap which happens to be used as a priority-queue?


Note, heres a link to a min-heap that has the characteristics described above.

ideasman42
  • 42,413
  • 44
  • 197
  • 320

2 Answers2

3

A binary heap is a concrete implementation of the priority queue abstract data structure. It's popular because it's easy to implement, memory efficient, and reasonably fast: O(log n) insertion, and O(log n) removal of the root (smallest in a min heap, largest in a max heap) element. Most implementations also provide a peek method that allows viewing the root element without removing it.

A binary heap doesn't do anything else particularly well. Contrary to your observation, finding a particular item in a binary heap requires a sequential scan. Although the nodes are ordered (not sorted), the order doesn't lend itself well to searching.

Typical implementation of a binary heap is in an array. Due to the shape property (the structure can be viewed as a perfect (or complete) binary tree), which means that the relationship between parent and child is represented implicitly. The items are stored in the array in breadth-first order.

As user templatetypedef pointed out in his answer, a binary heap is a specific kind of binary tree, and shouldn't be confused with a binary search tree, which is specifically designed for quick insertion and deletion of items, and locating items by key.

Although changing the priority of an item in the heap, or removing an arbitrary item from the heap, are very easy, the problem as you pointed out is locating the item to be operated on. In a typical binary heap, finding the item to be modified requires a sequential search. If you need the ability to move items around in the heap, you would typically marry the binary with a dictionary or hash map that's indexed by the item's key. The value is the index of the item in the array. That index is updated every time an item is moved. This slows heap operations by a constant factor, but gives you the ability to locate an item in O(1).

There's also something called a Min-max heap, which is a type of binary heap that gives you O(1) access to both the min and max item. The implementation is very similar to the implementation of a standard binary min heap.

To confuse matters even more, there also exists the d-ary heap, which is a heap that contains more than two children per node. For example, a trinary heap has three children per node. These, too, are implemented in arrays with implicit child pointers.

There are other data structures that are commonly called heaps, but are in fact not really related to the heap at all except that they're different implementations of the priority queue data structure. The most popular seem to be Pairing heap, Fibonacci heap, and Binomial heap, all of which can also be implemented with binary trees. (Again, not binary search trees.)

I wrote a somewhat guided introduction to binary heaps (and d-ary heaps) in my blog a few years ago. If you're interested, check out this entry, which lists all of the articles in that series.

Jim Mischel
  • 131,090
  • 20
  • 188
  • 351
  • re: *"Contrary to your observation, finding a particular item in a binary heap requires a sequential scan. Although the nodes are ordered (not sorted), the order doesn't lend itself well to searching."* my point was that when the the child nodes aren't ordered predictably, a lookup wont be as efficient as it would be for a balanced binary tree. – ideasman42 Oct 31 '17 at 04:46
  • @ideasman42: You said, "Searching (*while possible and faster than an exhaustive search*) ..." My point was that searching for an item in a binary heap requires an exhaustive search. – Jim Mischel Oct 31 '17 at 04:55
  • Since the parent and children are ordered in a min/max heap, a search can skip branches that exceed the search value, even if both branches need to be traversed, I think this would average better performance than comparing against every element. OTOH - I can't imagine many cases where this would be a good idea *(perhaps searching for values known to be near the top of the heap)*, otherwise - just use a binary-search-tree. – ideasman42 Oct 31 '17 at 05:18
  • @ideasman42: In theory, you're right. In practice, it's easier, and often faster, to do a sequential scan. – Jim Mischel Oct 31 '17 at 11:23
  • That means the statement: *"Contrary to your observation, finding a particular item in a binary heap requires a sequential scan"* isn't true. – ideasman42 Oct 31 '17 at 11:43
2

I think you’re confusing binary search trees and binary trees. A binary tree is more of a shape than anything else - it’s a tree where each node has at most two children. The nodes don’t necessarily have to have values in them, and if they do, there’s no requirement that they obey any particular rules.

A binary search tree is a binary tree where each node holds a key, and each node obeys the rule that all keys in the left subtree are less than the node’s key and all keys in the right subtree are greater than the node’s key. (Some definitions relax the requirement to allow for less-than-or-equal-to instead of just less than, etc.)

There are many other data structures built from binary trees that aren’t BSTs. k-d Trees store multidimensional data. Binary tries store strings of bits.

So I think the best description here is “binary heaps are binary trees that are complete and obey the heap property, which isn’t the same as a binary search tree even though they have the same underlying shape (more or less).”

templatetypedef
  • 362,284
  • 104
  • 897
  • 1,065
  • You're first paragraph seems correct AFAICS, however I think terminology for library and software development often uses binary-tree as a short-hand for binary-search-tree. So I'm still not sure: https://en.wikipedia.org/wiki/Priority_queue#Usual_implementation first mentions a `heap`, and a binary tree in the second example. The page on heaps calls this a `binary-heap`, not a `binary-tree`, asking this question because I'm unsure if these terms are used loosely or if there are strict differences here. – ideasman42 Oct 31 '17 at 00:41
  • A heap is generally considered an abstract data *type*, not a data structure, as it an almost canonical example of a type that can be implemented equally well using either a tree or an array. – chepner Oct 31 '17 at 01:24
  • @chepner I've typically seen "heap" used as shorthand for "binary heap," the specific data type, with "priority queue" used as the abstract data type. – templatetypedef Oct 31 '17 at 01:59
  • 1
    @ideasman42 People do often use "binary tree" to mean "binary search tree," but I'd say that that's just being sloppy with terminology. "Priority queue" is the general category of data structures that support inserting elements and dequeuing in increasing order of priority, with a binary heap (or just "heap" for short) being one possible implementation. – templatetypedef Oct 31 '17 at 02:01
  • @templatetypedef Using "heap" to refer to an implementation is just as sloppy. Heaps can be efficiently implemented using arrays to store an embedded "tree", where the children of element `i` are elements `i/2` and `i/2 + 1`, or as a tree of linked node objects. A priority queue is really an *application* of a heap. – chepner Oct 31 '17 at 11:54