Determining the missing number with duplicates in the list

Question

I have a list l = [3,1,2,5,3]. My goal is to find the repeated number in O(n) time, which here is 3, and also the missing number which is 4. Finally, the output should be [3, 4].

I trying to use a dictionary to find the repeated number but again to find the missing number I am using another loop which results in O(n^2) time complexity.

Can anyone tell how to find the answer in O(n) time complexity?

Another loop would not necessarily make it O(n^2), not if the two loops run in sequence. So one loop over `l` to gather the counts, then another loop over a range from 1 through to 5 to determine the missing number and the repeated number, would make two sequential loops that add up to O(2n), which is O(n). — Martijn Pieters, Jul 14 '18 at 14:17
Does your algorithm get access to the size of the range, like the start and end number? — Martijn Pieters, Jul 14 '18 at 14:18
So in the initial loop, also determine the boundaries, the minimum and maximum values, or gather those separately (which actually will be faster anyway). — Martijn Pieters, Jul 14 '18 at 14:23
Isn't [this](https://stackoverflow.com/a/14946745/1566221) what you are asking for? — rici, Jul 14 '18 at 22:40

Martijn Pieters · Accepted Answer · 2018-07-14T16:26:52.643

You'd only get a O(n^2) solution if you nested your loops. Multiple sequential loops of O(n) (so the next loop running once the preceding loop has completed) add up to a O(n) final solution, you can use as many of those as you need.

If your input doesn't include start and end numbers and you only know that they are going to be a sequence of positive integers with one number somewhere in the middle missing, the solution would be to first count the numbers and record the minimum and maximum number. Then loop from minimum to maximum, and include any number that is either missing or has a count of 2. Count using a dictionary, that way you'll also know what numbers are missing:

def missing_or_doubled(inputlist):
    counts = {}
    for n in inputlist:  # O(n) loop
        counts[n] = counts.get(n, 0) + 1
    start, end = min(counts), max(counts)  # two sequential O(n) loops
    # final O(n) loop to find missing number and number that appears twice
    return [i for i in range(start, end + 1) if counts.get(i) in {2, None}]

That's 4 different O(n) loops, all sequential. You can determine start and end inside the for n in inputlist loop too, but that's going to be slower; the min() and max() functions are implemented in C and their lower constant cost will beat any attempt implemented in pure Python.

Demo:

>>> missing_or_doubled([3, 1, 2, 5, 3])
[3, 4]
>>> missing_or_doubled([14, 19, 17, 13, 12, 10, 16, 17, 18, 22, 11, 15, 20])
[17, 21]

If you do know the start or end number up front (either because it is passed to your code as a parameter, or the problem description explicitly states what the start or end number will be), then just replace either the min(counts) or max(counts) assignments with that information. For example, if the starting number is always supposed to be 1, just use range(1, end + 1).

it is failing for [2,2] input. The algorithm should return [2,1] — merkle, Jul 14 '18 at 14:47
You didn't specify how the algorithm should handle the case if there is "no number missing" (I'd say there is no number missing for an input of `[2,2]`), so you cannot blame the answer for not handling that case. — MSeifert, Jul 14 '18 at 14:48
@merkle: that means there *must* be a known starting point then. My answer clearly qualifies the context: *If your input doesn't include start and end numbers and you only know that they are going to be a sequence of positive integers with one number somewhere in the middle missing*. So instead of determining `start` with `min(counts)`, just use `start = 1`. — Martijn Pieters, Jul 14 '18 at 14:49
@merkle: this is why I specifically asked you if you knew more about the start and end numbers. Your answer to that question was ambiguous, so I went with the explanation that the start and end values are missing, and made an assumption that they would be positive integers. — Martijn Pieters, Jul 14 '18 at 14:50
Sorry from my side for not giving a clear explanation. Thanks a lot, @Martjin for helping. I will be clear enough from the next time. — merkle, Jul 14 '18 at 14:55

Determining the missing number with duplicates in the list

1 Answers1