8

Suppose I am having a Collection of object:

List<String> myList = populateMyArrayList();
//Here I am having an ArrayList with 1000 elements

Which is the better approach:

1 : Mergesort then Binary Search

Collections.sort(myList);
int keyIndex = Collections.binarySearch(myList, key);

2 : Sequential Search

for(String s : myList){
   if(s.equals(key)){
      return s;
   }
}

Should there be a difference in searching approach based on the size of the collection to be searched? If YES then how to decide.

EDIT1: Suppose I have to search the list a couple of times, and no new elements will be added in the list.

EDIT2: I could have gone for a HashSet, but I am actually having a List<CustomObject> and I can search the List multiple times based on different attributes of CustomObject. So I can't have a overridden equals method in my CustomObject

Zeeshan
  • 11,851
  • 21
  • 73
  • 98

4 Answers4

16

It depends.

  • If you are searching for only one string the linear search is better because it is in O(n)
  • If you are searching for multiple strings first sorting and then binary searching maybe better. it will be O(logn + n*logn) which is O(n*logn). So if you are checking for ca. n strings, this one is better.
  • If you only want to know if your Collection contains an element (ignoring order), you should consider using HashSet which has O(1).
  • If you need order and a fast contains method, use LinkedHashSet

P.S. premature optimization is the root of all evil.

Absurd-Mind
  • 7,884
  • 5
  • 35
  • 47
  • How is that premature? – Thomas Jungblut May 26 '14 at 11:12
  • 1
    He didn't check if a naive solution suffices his needs. – Absurd-Mind May 26 '14 at 11:13
  • 1
    It is worth mentioning that "big O" notation expresses the behavior of the algorithm as `n` varies, but it does not imply that one algorithm is "faster" than other for given `n` value. A `log(n)` algorithm could be slower than a `O(n^2)` algorithm for a given `n`(if `n` is big enough, in the end the algorithm with the better function will win, but such `n` value may be large enough to be meaningless). Not that this comments belongs only to your answer, but I had to post it somewhere. – SJuan76 May 26 '14 at 11:15
  • @SJuan76 Big O assumes that `n` tends to infinity, so it does not say anything about small `n`. – Absurd-Mind May 26 '14 at 11:17
  • That is my point... For a limited value (`1000` in this case), determining which algorithm is "faster" is not so easy. – SJuan76 May 26 '14 at 11:20
  • @SJuan76 I did not say anything about speed. The question was 'what is more efficient' and just by looking at the formulas it is linear search in case of one search string. Furthermore i tried to suggest that the OP should first try if any approach suffices ("Premature optimization..."). – Absurd-Mind May 26 '14 at 11:24
4

If you do the search just one time:

  • Complexity for sort + binary search will be O(n * log n).
  • Complexity for linear search is O(n).

If you do the search for more than one time, let's say k times:

  • Complexity for sort + binary search will be O((n + k) * log n).
  • Complexity for linear search will be O(k * n).

So if you do the search just one time you should go with linear search. If you do the search for more than one time, most probably you should sort first.

Also, maybe in this case you can consider using a hash table, which has amortized complexity of O(1) for an element search.

Community
  • 1
  • 1
Andrei Bozantan
  • 3,781
  • 2
  • 30
  • 40
  • 1
    Complexity for sort + binary search will be O(nLogn + log n) not O(n * log n) – TheLostMind May 26 '14 at 11:18
  • 4
    @TheLostMind In complexity theory O(n * log n + log n) is considered equal to O(n * log n), since only the term with the highest rank is considered important. Please check the other answers for this question also, and reconsider your negative vote. – Andrei Bozantan May 26 '14 at 11:23
  • 1
    Complexity for sort + binary search in case of k runs will be O(k * log n) only if k >> n and it definitely remains O(n * log n) if k = 1. The proper complexity I believe is O( (k + n)*log n) or at least a bit of explanation is required. – Archeg May 26 '14 at 11:33
  • @bosonix - My bad. I thought you had entered the complexity of mergesort as "n". Please be more clear the next time around. – TheLostMind May 26 '14 at 11:42
  • @AndreiBozantan why is it O(n * log n) ? is the quicksort (assuming that's the most common sorting used in languages) complexity negligeable to the binarysearch complexity ? – adaba Oct 01 '20 at 20:34
0

If you only search the list once (or seldom) linear search is cheaper. If you search the list more often the cost of sorting is repaid. Sorting costs O(n log n) average time complexity und searching then O(log n). If you search for almost "every element" this also costs O(n) average time complexity and you are "even" with sorting.

Stefan
  • 4,645
  • 1
  • 19
  • 35
0

A binary seach is O(log(m)) and is faster than a linear search of O(n). However one must first sort the data: O(n log(n)), which takes longer.

So if the data is filled once, and then often sougth for, take a sorting and binary search. Even better: take a Set. And a HashSet would be even nicer.

Joop Eggen
  • 107,315
  • 7
  • 83
  • 138