0

I was asked this in interview. I thought the question was too generic to specify a particular data structure.

However, still if we channelize the question to following criterion what would be the best data structure to use:

  1. If insertion speed should be fastest?
  2. If search of particular data is to be the fastest?
Jim Mischel
  • 131,090
  • 20
  • 188
  • 351
Manas Saxena
  • 2,171
  • 6
  • 39
  • 58
  • 2
    Have a look at [big-o-summary-for-java-collections-framework-implementations](http://stackoverflow.com/questions/559839/big-o-summary-for-java-collections-framework-implementations). You'll find the answer there. – MockerTim Jul 19 '16 at 17:57
  • 1
    Usually they might want you to come up with more questions to clarify their question like - what do you want to use the records for? the dataset rarely changes - mostly reading? the dataset does insert and remove a lot? – Tin Jul 19 '16 at 18:17
  • For faster insertions - Linked List, for faster searches - binary search tree, which one is more important ? – SomeDude Jul 19 '16 at 18:38
  • According to [bigocheatsheet.com/] , Hashtable is fastest for Search Insertion Deletion with O(1) .So why not use Hashtable every time to store anything ?? – Manas Saxena Jul 19 '16 at 19:41
  • @user2713255: Because a hash table doesn't guarantee enumeration order, a hash table will require much more space than a list or array, and you can't in general index a hash table by position. A hash table is a good tool, but it's not a Swiss Army knife. – Jim Mischel Jul 20 '16 at 02:25

1 Answers1

7

The HashSet provides both O(1) insertion and O(1) search, which is hard to top from a theoretical point of view.

In practice, for a size of 10.000 references, a sorted ArrayList might still outperform HashSet, although insertion is O(n) and search is O(log(n)). Why? Because it stores the data (the references at least) in a contiguous memory range and therefore can take advantage of hardware memory caching.

The issue with big-O notation is that it completely ignores the time required for a single operation. That's fine for asymptotic considerations and very huge data sets, but for a size of 10.000 it might be misleading.

Haven't tried it, though. And I bet your Interviewer hasn't either :).

Frank Puffer
  • 8,135
  • 2
  • 20
  • 45