4

Full disclosure this is for a homework problem, as such, i won't be asking for specific solutions, just general answers to some questions. The text of the problem is as follows:

Given a sorted array of type T that must implement the Comparable interface, write a Java generic method that finds a specific element in the array and returns it, or returns null if it is not found. Note that your algorithm must run worse-case in log base 3 time, a log base 2 (binary search) solution will receive no credit

There is also an additional stipulation that the algorithm must be iterative, and not recursive.

1) So my first question is, How do I determine that an algorithm runs in log base 3 as opposed to log base 2? The two only differ by a constant factor, so even if i analyze the structure of the algorithm, how can I know if i'm running in Log3(n) time, or just running in (~0.63)(Log2(n)) time? Does it even matter?

2) I'm under the impression that binary search is more or less the standard fast algorithm for searching sorted arrays so other than binary and linear search, what other search algorithms are there that I can look to for inspiration on this?

3) Is there something that i'm missing, some stipulation that would allow the array to be searched faster than a standard binary search might?

Any help is much appreciated, sorry if this question is fairly specific but I think that some parts of it can be generally applicable.

  • @dhS: not true, we're here for well-posed questions that can enhance the knowledge-base of questions and answers that is SO's quest. IMHO this question fits that requirement, hence my upvote. – Bathsheba Sep 19 '16 at 10:39
  • It's unclear what you're asking because you haven't specify what notation you mean. – xenteros Sep 19 '16 at 10:47
  • 2
    It is just me or the assignment implies to use search in base 3 instead of 2 ... so splitting into thirds instead of halves of index ... which is `T(log3(n))` but needs 2x `if` per iteration instead of single one. There are other searches then binary out there you can for example take a look at this [How approximation search works](http://stackoverflow.com/a/36163847/2521214) – Spektre Sep 19 '16 at 11:03
  • 2
    As you have figured out, the assignment makes no sense. You're probably expected to write a ternary search, which is NOT going to be faster than a binary search. If computer science is important to you, though, you should figure out if you're really being taught by someone who doesn't actually know any computer science, and what you're going to do about it. – Matt Timmermans Sep 19 '16 at 12:13
  • To play devil's advocate, ternary search isn't particularly useful in an array, but it *might* be a prelude to a ternary search tree, which introduces the concept of a search tree with a higher branching factor, which *can* be extremely useful in cache-aware algorithms. – chepner Sep 19 '16 at 12:26
  • 1
    Do tell us what the problem author had in mind! If you already know, that is. – Gassa Sep 21 '16 at 00:18

3 Answers3

2
  1. True, O(log(n)) does not care whether log is base 2 or base 3.

  2. Assume now the task is to use no more than a certain number of comparisons. We can prove that solving it in no more than log3(N) comparisons with no constant factor in the worst case is impossible. For starters, if the array has three elements, it is impossible to find the element or report it missing after just one comparison. Similarly, for a nine-element array, two comparisons are clearly not enough. One can use pigeonhole principle and some care to prove it's impossible for large N, too, though I admit I failed to make a short proof in a few minutes.
    The idea of proof I'd try for large N is this (the details got tedious though). If all array elements are distinct and also different from X, the one we search for, we get only K bits of information after asking K questions, so can distinguish between no more than 2K segments of the array where X could be. Then, there exists a segment of certain length (say, 3 elements), and in it, an element we did not compare with but which can still be equal to X. Change it to X, and we obtain a counterexample array.

Gassa
  • 8,546
  • 3
  • 29
  • 49
  • `We can prove that [deciding occurrence in <=] log3(N) comparisons with no constant factor in the worst case is impossible. For starters, if the array has three elements, it is impossible to find the element or report it missing after just one comparison. Similarly, for a nine-element array, …` formally, not exceeding the limiting function is sufficient above some threshold input value x₀. Further, this seems to assume comparisons between exactly two input values (agreed) and two possible outcomes - there have been machine implementations (and programming language contortions) with <, =, and > – greybeard Sep 21 '16 at 03:02
  • @greybeard You are right that, for a Java `Comparable`, there are three possible outcomes. However, with my assumption (*all array elements are distinct and also different from X*), there are only two outcomes left. We later break this assumption but only for one element we never compared X to. – Gassa Sep 21 '16 at 15:44
1

The primary benefit of binary search is simplicity; given a subarray with endpoints x and y, you make one comparison and one midpoint computation to determine the next subarray (either [x,(x+y)/2] or [(x+y)/2, y].

With ternary search, you need to make two comparisons and two midpoint computations in order to determine which of [x, (x+y)/3], [(x+y)/3, 2*(x+y)/3], or [2*(x+y)/3, y] to search next. You narrow down the range faster, but at the expense of more work.

Asymptotically, they are both O(lg n). Binary search wins because it is simple, but also because you assume that accessing the next key is no more expensive than the comparison used to determine it. For an array, which is stored contiguously in memory, this is eventually true, since eventually the subarray fits in main memory, then fits entirely in each successive level of cache. (That is, arrays have good spatial locality).

For data that is not stored in an array--say, in a tree--the cost of fetching the next node could far outweigh the cost of the comparison(s) needed to determine which node to fetch. In such a case, you want to load as much useful data into memory as possible at each fetch. A ternary tree probably isn't much better than a binary tree in this case, but something like a B-tree, where the size of each internal node (which determines how many keys can be stored in the node) is tuned to the size of your local cache, can greatly reduce the actual running time, even though is it still an O(lg(n)) like binary search.

chepner
  • 497,756
  • 71
  • 530
  • 681
-2

I believe you can do better than a simple binary search if you do something smarter than merely halving the collection on each iteration.

You can instead choose the pivot by inspecting the input viewed more as a monotonic function than as a sequence of values.

The Brent algorithm uses this approach and will get you faster convergence.

See https://en.wikipedia.org/wiki/Brent%27s_method

If it were my assignment I'd base my solution on that.

Bathsheba
  • 231,907
  • 34
  • 361
  • 483
  • @Downvoter, if you have concerns over my choice of algorithm then do please let me know. – Bathsheba Sep 19 '16 at 11:09
  • Brent's method is an interesting read, but its relation to this question is unlikely (and that may explain the downvotes). The method aims to speed up the search for well-behaved continuous functions. One problem is that a sorted array is not continuous; this can be overcome by proper rounding. The other is that the worst case time complexity is O(log^2(n)), which is inferior to binary search. Lastly, base 3 logarithm seems unrelated to Brent's method anyway. – Gassa Sep 20 '16 at 00:48