Codility performance difference: array vs hash

Question

I made a test in codility: https://codility.com/programmers/lessons/2-arrays/odd_occurrences_in_array/ and I noticed a performance difference between two different solutions:

1 - List solution:

def solution(list)
  unmatched_elements = []
  list.each{ |el|
    if unmatched_elements.include? el
      unmatched_elements.delete el
    else
      unmatched_elements.push el
    end
  }
  unmatched_elements[0]
end

2 - Hash Solution

def solution(a)
  unmatch = {}

  a.each do |e|
    if unmatch[e].nil?
      unmatch[e] = 1
    else
      unmatch.delete e
    end
  end
  unmatch.keys.first
end

The first one gave me 25% performance score with some timeouts. The second one gave me 100% performance score. Why is that? I tought pushing to a Hash would result in O(n) space complexity like a List, but it seems it's not, why?

score 3 · Accepted Answer · edited May 23 '17 at 11:53

3

It's not about space complexity, it's about time complexity. Specifically, looking up an element in an array (include?) is a N time operation since it needs to check each element until there's a match. Hash lookup [] is constant time.

This answer explains why Hash has O(1) search time: https://stackoverflow.com/a/4363602/1034681

edited May 23 '17 at 11:53

Community

1
1

answered Mar 12 '17 at 03:22

max pleaner

26,189
9
66
118

In the test conditions it says: "expected worst-case time complexity is O(N); expected worst-case space complexity is O(1);". That's why I tought the space was the problem. Since the test returned timeouts in array case it means array search can be worst than O(N) ? – Jirico Mar 12 '17 at 03:28
unfortunately i don't know. – max pleaner Mar 12 '17 at 03:32
1

@Jirico: it doesn't make sense to talk about "O(N) expected worst-case time complexity" without specifying a) what you mean by "N" and b) what you mean by "time". In general, when talking about algorithmic complexity, you always need to specify a machine model (which tells you which operations are allowed) and a cost model (which tells you how costly each of the allowed operations is). You also need to specify what assumptions you can make on your data, and lastly, you need to specify how you define "input size". For example: in a Turing Machine, copying a list takes O(n²) steps, not O(n). … – Jörg W Mittag Mar 12 '17 at 10:23
1

In the standard Random-Access-Machine model (the most widely used since it is the one closest to how modern mainstream general purpose computers work), comparison-based sorting takes O(n log n) comparisons. But in a machine model without random access, where you are only allowed to swap adjacent elements, comparison-based sorting takes O(n²) comparisons. OTOH, if you can assume a certain kind of structure on your items, then you can sort in O(n) time (e.g. radix sort, counting sort) without using comparisons. In your specific case, you need to iterate over every element of `list` and in … – Jörg W Mittag Mar 12 '17 at 10:26
1

… every iteration of the loop, you need to iterate twice over every element of `unmatched_elements` (once for `include?` and again for `delete?`). `unmatched_elements` can grow and shrink, so the analysis is non-obvious, but the worst-case is if the array looks like this: `[1, 2, 3, 4, 5, 6, 7, …, 1000000, 1234567890, 1000000, 999999, 999998, …]`, so it can grow to `list.size / 2` in the worst case. That means, your complexity is O(list.size * 2*unmatched_elements.size) which is O(list.size²)! – Jörg W Mittag Mar 12 '17 at 10:44

Codility performance difference: array vs hash

1 Answers1