5

After having searched the internet I was not able to satisfy myself that I had found a comprehensive set of situations in which a linear search would be preferable to a binary search.

I am essentially wondering whether it would be possible to compile a relatively definitive list of advice (from the point of view of general programming as one might find in industry). Alternatively I would much appreciate it if it could be verified that indeed I have seen all there is to say on the subject.

user3444063
  • 63
  • 1
  • 1
  • 6
  • If the data is initially unsorted, linear search will definitely be faster than sorting followed by binary search, if you are only searching once. – 500 - Internal Server Error Mar 20 '14 at 22:03
  • "is only to be searched once" can be expanded to "will be searched a small number of times relative to the size of the list". As you say, "small" is a vague notion that can only be decided by benchmarking a specific situation. – Mark Ransom Mar 20 '14 at 22:15
  • 1
    (6) You have more than one-dimensional data (then binary search becomes harder) – Niklas B. Mar 20 '14 at 23:36
  • No, then you just sort lexicographically and binary search anyway. – user2357112 Mar 21 '14 at 01:05
  • @user2357112 I obviously meant that you want to search in a combination of dimension that you didn't index, or you want to perform range searches. Lexicographical ordering doesn't buy you a lot when you're not interested in the first component. Sure we can use multi-dimensional data structures and "binary search" in those, but as I said it gets more complicated – Niklas B. Mar 21 '14 at 01:14
  • I think you should post your 5 points as an answer, and perhaps make it [community wiki](http://stackoverflow.com/help/privileges/community-wiki). The question's current format isn't a good fit for a Q&A site. – Bernhard Barker Mar 21 '14 at 01:47
  • sure, does this work better? would you mind explaining what was wrong with the format please. Did i mix the problem with potential (and potentially flawed) answers? – user3444063 Mar 21 '14 at 17:36
  • Possible duplicate of [At which n does binary search become faster than linear search on a modern CPU?](https://stackoverflow.com/questions/1275665/at-which-n-does-binary-search-become-faster-than-linear-search-on-a-modern-cpu) – phuclv Apr 05 '18 at 01:07

2 Answers2

1

My list of reasons for choosing a linear search over a binary search are as follows:

  1. The list is unsorted and is only to be searched once

  2. The list is small (though this itself is a vague notion - I've read less than around 100 elements?)

  3. The list will need sorting following the search operation (due to say an insertion), since the resorting will dominate the time complexity of the overall task

  4. The data structure is not random access (like a linked-list)

  5. There is no knowledge of the data that could aid searching (relative proximities?)

user3444063
  • 63
  • 1
  • 1
  • 6
1

You probably can't come up with a definitive list. For example, I did some tests a while back with searching sorted lists in .NET. With a sorted list of integers, binary search turned out to be faster than sequential search when the number of items was 13. With a sorted list of strings, that number was 8. With other types for which comparison was more expensive, the number was even smaller.

Running the same test using a different language or runtime library will give you different numbers. It could even depend on the memory access hardware and probably some other hardware considerations.

The conventional wisdom was (perhaps still is) that sequential search was so much simpler than binary search that the reduced complexity gave it a large advantage on small lists. The truth today is that CPU speeds and memory access are so fast that the simplicity of sequential search is a factor only when the lists are very small.

At best you can come up with a definitive set of rules that apply to one runtime configuration on specific hardware when comparing a specific data type. If you change environments or change data types, you have to write tests to benchmark it all over again.

Jim Mischel
  • 131,090
  • 20
  • 188
  • 351