0

I have two lists of phone numbers. 1st list is a subset of 2nd list. I ran two different algorithms below to determine which phone numbers are contained in both of two lists.

  • Way 1:
    • Sortting 1st list: Arrays.sort(FirstList);
    • Looping 2nd list to find matched element: If Arrays.binarySearch(FistList, 'each of 2nd list') then OK
  • Way 2:
    • Convert 1st list into HashMap with key/valus is ('each of 1st list', Boolean.TRUE)
    • Looping 2nd list to find matched element: If FirstList.containsKey('each of 2nd list') then OK

It results in Way 2 ran within 5 seconds is faster considerably than Way 1 with 39 seconds. I can't understand the reason why.

I appreciate your any comments.

Binh Le
  • 353
  • 3
  • 11

4 Answers4

5

Because hashing is O(1) and binary searching is O(log N).

user207421
  • 305,947
  • 44
  • 307
  • 483
1

HashMap relies on a very efficient algorithm called 'hashing' which has been in use for many years and is reliable and effective. Essentially the way it works is to split the items in the collection into much smaller groups which can be accessed extremely quickly. Once the group is located a less efficient search mechanism can be used to locate the specific item.

Identifying the group for an item occurs via an algorithm called a 'hashing function'. In Java the hashing method is Object.hashCode() which returns an int representing the group. As long as hashCode is well defined for your class you should expect HashMap to be very efficient which is exactly what you've found.

There's a very good discussion on the various types of Map and which to use at Difference between HashMap, LinkedHashMap and TreeMap

My shorthand rule-of-thumb is to always use HashMap unless you can't define an appropriate hashCode for your keys or the items need to be ordered (either natural or insertion).

Community
  • 1
  • 1
sprinter
  • 27,148
  • 6
  • 47
  • 78
0

Look at the source code for HashMap: it creates and stores a hash for each added (key, value) pair, then the containsKey() method calculates a hash for the given key, and uses a very fast operation to check if it is already in the map. So most retrieval operations are very fast.

0

Way 1:

  • Sorting: around O(nlogn)

  • Search: around O(logn)

Way 2:

  • Creating HashTable: O(n) for small density (no collisions)

  • Contains: O(1)

M. Shaw
  • 1,742
  • 11
  • 15