7

I had an interview today, I was asked how search for a number inside an array, I said binarysearch, he asked me how about a big array that has thousands of objects (for example Stocks) searching for example by price of the stocks, I said binarysearch again, he said sorting an array of thousands will take lot of time before applying binarysearch.

Can you please bear with me and teach me how to approach this problem? thanks your help is appreciated.

Manas Khandelwal
  • 3,790
  • 2
  • 11
  • 24
Majid Kamal
  • 363
  • 3
  • 7
  • 17
  • Generally, to search a large set of things, one uses some sort of hash table. – Hot Licks Mar 23 '12 at 01:04
  • [Which is faster, Hash lookup or Binary search?](http://stackoverflow.com/questions/360040/which-is-faster-hash-lookup-or-binary-search) – Josh Mar 23 '12 at 01:08
  • 2
    @Josh -- Trick question. The binary search is faster if everything's all nicely sorted and you won't ever be modifying the set to be searched. But that's not real life. In real life the hash table almost always wins. – Hot Licks Mar 23 '12 at 01:17

3 Answers3

3

I was asked a similar question.The twist was to search in sorted and then an unsorted array .These were my answers all unaccepted

  1. For sorted I suggested we can find the center and do a linear search .Binary search will also work here
  2. For unsorted I suggested linear again .
  3. Then I suggested Binary which is kind of wrong.
  4. Suggested storing the array in a hashset and utilize hashing . (Not accepted since high space complexcity)
  5. I suggested Tree Set which is a Red Black tree quite good for lookup.(Not accepted since high space complexcity)
  6. Copying into Arraylist etch were also considered overhead.

In the end I got a negative feedback. Though we may think that one of the above is solution but surely there is something special in linear searching which I am missing.

To be noted sorting before searching is also an overhead especially if you are utilizing any extra data structures in between.

Any comments welcomed.

user666
  • 1,104
  • 12
  • 20
  • I would say for sorted binary tree and for unsorted you could sort as your 1) answer. The other way would be to iterate over the array and save the data into hash table O(n) and look up the data over the hash table would be O(1). But the lookup should be within the loop. If the data exists then you don't need to save it. What do you think? – Noah Tony Aug 31 '18 at 16:57
1

I think the interviewer wants you to analyze under different case about the array initial state, what algorithm will you use. Of cause , you must know you can build a hash table and then O(1) can find the number, or when the array is sorted (time spent on sorting maybe concerned) , you can use binarysearch, or use some other data structures to finish the job.

jianpx
  • 3,190
  • 1
  • 30
  • 26
1

I am not sure what he had in mind.

If you just want to find the number one time, and you have no guarantees about whether the array is sorted, then I don't think you can beat linear search. On average you will need to seek halfway through the array before you find the value, i.e. expected running time O(N); when sorting you have to touch every single value at least once and probably more than that, i.e. expected running time O(N log N).

But if you need to find multiple values then the time spent sorting it pays off quickly. With a sorted array, you can binary search in O(log N) time, so for sure by the third search you are ahead if you invested the time to sort.

You can do even better if you are allowed to build different data structures to help with the problem. You could build some sort of index, such as a hash table; but the champion data structure for this sort of problem probably would be some sort of tree structure. Then you can insert new values into the tree faster than you could append new values and re-sort the array, and the lookup is still going to be O(log N) to find any value. There are different sorts of trees available: binary tree, B-tree, trie, etc.

But as @Hot Licks said, a hash table is often used for this sort of thing, and it's pretty cheap to update: you just append a value on the main array, and update the hash table to point to the new value. And a hash table is very close to O(1) time, which you can't beat. (A hash table is O(1) if there are no hash collisions; assuming a good hash algorithm and a big enough hash table there will be almost no collisions. I think you could say that a hash table is O(N) where N is the average number of hash collisions per "bucket". If I'm wrong about that I expect to be corrected very quickly; this is StackOverflow!)

steveha
  • 74,789
  • 21
  • 92
  • 117
  • I did not understand what did you mean by third search ? any exaple plz ? – Majid Kamal Mar 23 '12 at 01:14
  • If you have to search only one time, and then you are done, linear search is fastest. If you have to search two times, linear search might still be faster than sorting plus binary search; on average, linear search will need to go through about half the values, so two linear searches should on average need to go through all the values. If you have to search three times, sorting once and then using binary search for the three searches should be fastest. If you have to search four or more times, it is the same as three times: sort first then do binary search. – steveha Mar 23 '12 at 01:18
  • If you have to search more than twice you're probably better off using the hash table. – Hot Licks Mar 23 '12 at 11:17