0

Given the array :

int arr[]= {1, 2, 3, 2, 3, 1, 3}

You are asked to find a number within the array that occurs odd number of times. It's 3 (occurring 3 times). The time complexity should be at least O(n). The solution is to use an HashMap. Elements become keys and their counts become values of the hashmap.

// Code belongs to geeksforgeeks.org
// function to find the element occurring odd 
    // number of times 

    static int getOddOccurrence(int arr[], int n) 
    { 
        HashMap<Integer,Integer> hmap = new HashMap<>(); 
        // Putting all elements into the HashMap 
        for(int i = 0; i < n; i++) 
        { 
            if(hmap.containsKey(arr[i])) 
            { 
                int val = hmap.get(arr[i]); 
                // If array element is already present then 
                // increase the count of that element. 
                hmap.put(arr[i], val + 1);  
            } 
            else
                // if array element is not present then put 
                // element into the HashMap and initialize  
                // the count to one. 
                hmap.put(arr[i], 1);  
        } 

        // Checking for odd occurrence of each element present 
          // in the HashMap  
        for(Integer a:hmap.keySet()) 
        { 
            if(hmap.get(a) % 2 != 0) 
                return a; 
        } 
        return -1; 
    } 

I don't get why this overall operation takes O(N) time complexity. If I think about it, the loop alone takes O(N) time complexity. Those hmap.put (an insert operation) and hmap.get(a find operations) take O(N) and they are nested within the loop. So normally I would think this function takes O(N^2) times. Why it instead takes O(N)?.

2 Answers2

1

The algorithm first iterates the array of numbers, of size n, to generate the map with counts of occurrences. It should be clear why this is an O(n) operation. Then, after the hashmap has been built, it iterates that map and finds all entries whose counts are odd numbers. The size of this map would in practice be somewhere between 1 (in the case of all input numbers being the same), and n (in the case where all inputs are different). So, this second operation is also bounded by O(n), leaving the entire algorithm O(n).

Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
1

I don't get why this overall operation takes O(N) time complexity.

You must examine all elements of the array - O(N)

For each element of the array you call contain, get and put on the array. These are O(1) operations. Or more precisely, they are O(1) on averaged amortized over the lifefime of the HashMap. This is due to the fact that a HashMap will grow its hash array when the ratio of the array size to the number of elements exceeds the load factor.

O(N) repetitions of 2 or 3 O(1) operations is O(N). QED

Reference:


Strictly speaking there are a couple of scenarios where a HashMap is not O(1).

  • If the hash function is poor (or the key distribution is pathological) the hash chains will be unbalanced. With early HashMap implementations, this could lead to (worst case) O(N) operations because operations like get had to search a long hash chain. With recent implementations, HashMap will construct a balanced binary tree for any hash chain that is too long. That leads to worst case O(logN) operations.

  • HashMap is unable to grow the hash array beyond 2^31 hash buckets. So at that point HashMap complexity starts transitioning to O(log N) complexity. However if you have a map that size, other secondary effects will probably have affected the real performance anyway.

Stephen C
  • 698,415
  • 94
  • 811
  • 1,216
  • So technically it doesn't matter if there are some collisions occur (internally) during the hashing operation. We exclude those considerations over particular DS implementation? – Plain_Dude_Sleeping_Alone May 30 '20 at 03:07
  • Nah ok i understand. Beside those operations (within the loop) don't depend on *N* anyway. Thanks. – Plain_Dude_Sleeping_Alone May 30 '20 at 03:15
  • 1
    Correct. The complexity analysis is based on the average length of a hash chain. The HashMap strategy of growing the array should keep the *average* length small; i.e. less than 2. – Stephen C May 30 '20 at 03:17