Find the first missing positive integer in Python

Question

I have the following task:

Given an unsorted integer array, find the first missing positive integer. Your algorithm should run in O(n) time and use constant space.

After thinking and getting a hint, I decided to change the input list A. The following is the code:

def firstMissingPositive(A):
        m=max(A)
        ln=len(A)
        i=0
        while i<ln:
            if A[i]>=1 and A[i]<=ln:
                if A[A[i]-1]!=m+1:

                   A[A[i]-1], A[i] = m+1, A[A[i]-1]
                else:
                    i+=1

            else:
                i+=1
        for i in range(ln):
            if A[i]!=m+1:
                return i+1

When I run it, it takes a long time. What shall I do to make it a bit faster?

EDIT: here is the list A.

A=[ 417, 929, 845, 462, 675, 175, 73, 867, 14, 201, 777, 407, 80, 882, 785, 563, 209, 261, 776, 362, 730, 74, 649, 465, 353, 801, 503, 154, 998, 286, 520, 692, 68, 805, 835, 210, 819, 341, 564, 215, 984, 643, 381, 793, 726, 213, 866, 706, 97, 538, 308, 797, 883, 59, 328, 743, 694, 607, 729, 821, 32, 672, 130, 13, 76, 724, 384, 444, 884, 192, 917, 75, 551, 96, 418, 840, 235, 433, 290, 954, 549, 950, 21, 711, 781, 132, 296, 44, 439, 164, 401, 505, 923, 136, 317, 548, 787, 224, 23, 185, 6, 350, 822, 457, 489, 133, 31, 830, 386, 671, 999, 255, 222, 944, 952, 637, 523, 494, 916, 95, 734, 908, 90, 541, 470, 941, 876, 264, 880, 761, 535, 738, 128, 772, 39, 553, 656, 603, 868, 292, 117, 966, 259, 619, 836, 818, 493, 592, 380, 500, 599, 839, 268, 67, 591, 126, 773, 635, 800, 842, 536, 668, 896, 260, 664, 506, 280, 435, 618, 398, 533, 647, 373, 713, 745, 478, 129, 844, 640, 886, 972, 62, 636, 79, 600, 263, 52, 719, 665, 376, 351, 623, 276, 66, 316, 813, 663, 831, 160, 237, 567, 928, 543, 508, 638, 487, 234, 997, 307, 480, 620, 890, 216, 147, 271, 989, 872, 994, 488, 291, 331, 8, 769, 481, 924, 166, 89, 824, -4, 590, 416, 17, 814, 728, 18, 673, 662, 410, 727, 667, 631, 660, 625, 683, 33, 436, 930, 91, 141, 948, 138, 113, 253, 56, 432, 744, 302, 211, 262, 968, 945, 396, 240, 594, 684, 958, 343, 879, 155, 395, 288, 550, 482, 557, 826, 598, 795, 914, 892, 690, 964, 981, 150, 179, 515, 205, 265, 823, 799, 190, 236, 24, 498, 229, 420, 753, 936, 191, 366, 935, 434, 311, 920, 167, 817, 220, 219, 741, -2, 674, 330, 909, 162, 443, 412, 974, 294, 864, 971, 760, 225, 681, 689, 608, 931, 427, 687, 466, 894, 303, 390, 242, 339, 252, 20, 218, 499, 232, 184, 490, 4, 957, 597, 477, 354, 677, 691, 25, 580, 897, 542, 186, 359, 346, 409, 655, 979, 853, 411, 344, 358, 559, 765, 383, 484, 181, 82, 514, 582, 593, 77, 228, 921, 348, 453, 274, 449, 106, 657, 783, 782, 811, 333, 305, 784, 581, 746, 858, 249, 479, 652, 270, 429, 614, 903, 102, 378, 575, 119, 196, 12, 990, 356, 277, 169, 70, 518, 282, 676, 137, 622, 616, 357, 913, 161, 3, 589, 327 ]

What else do we know about the array? Is there a missing positive integer? Is there only one? Are there duplicate values? — Robᵩ, Dec 10 '16 at 01:39
If the input known to be non-negative? If it's something like a shuffled range from 0 to n, with a single number removed, this is trivial, but you need to be more clear on the known qualities of the input. — ShadowRanger, Dec 10 '16 at 01:43
I'd start by adding `print("Swapping %d"%i)` in the swap branch of the `if`. What do you learn from its output? — Robᵩ, Dec 10 '16 at 01:45
@EliSadoff, if I recall correctly, sorting methods use O(nlogn) — highsky, Dec 10 '16 at 02:35
The elements in the array are numbers. There could be negative number or numbers which are bigger than n. — highsky, Dec 10 '16 at 02:37

Copperfield · Answer 1 · 2016-12-13T02:02:24.860

~~look to me that doing it in both O(n) and constant space is kind of impossible.~~ (I stand corrected, the link of Rockybilly give such solution)

To do it in constant space one is kind of forced to sort the list and that is O(n log n) for most sorting algorithms, and the ones here looks like insertion sort which is in average O(n²)

So FWIW I choose to discard the constant space in order to do it as close to O(n) as I can think of which give me a 2n solution (in the worse case scenario, in average is n+k) (which in big O notation is still O(n) )

def firstMissingSince(sequence, start=1):
    uniques = set()
    maxitem = start-1
    for e in sequence:
        if e >= start:
            uniques.add(e)
            if e > maxitem:
                maxitem = e
    return next( x for x in range(start, maxitem+2) if x not in uniques )

(version 2 bellow )

I could have used set(sequence), and max(sequence) but both are O(n), so I combined them in the same loop, and I use set for two reason: first is a potential reduction in space by ignoring duplicates and in the same way I only care for number greater or equal to my lower bound (which I also make generic) and second is for the O(1) membership testing.

The last line is a simple linear search for the missing element with the property that default to start if the maximum element is lower to start or is the maximum+1 if the array have no missing element between start and its maximum.

here are some tests borrow from the others answers...

assert 1 == firstMissingSince(A) 
assert 2 == firstMissingSince([1,4,3,6,5])
assert 2 == firstMissingSince([1,44,3,66,55]) 
assert 6 == firstMissingSince([1,2,3,4,5]) 
assert 4 == firstMissingSince([-6, 3, 10, 14, 17, 6, 14, 1, -5, -8, 8, 15, 17, -10, 2, 7, 11, 2, 7, 11])
assert 4 == firstMissingSince([18, 2, 13, 3, 3, 0, 14, 1, 18, 12, 6, -1, -3, 15, 11, 13, -8, 7, -8, -7])
assert 4 == firstMissingSince([-6, 3, 10, 14, 17, 6, 14, 1, -5, -8, 8, 15, 17, -10, 2, 7, 11, 2, 7, 11])
assert 3 == firstMissingSince([7, -7, 19, 6, -3, -6, 1, -8, -1, 19, -8, 2, 4, 19, 5, 6, 6, 18, 8, 17])

the answer of Rockybilly make me realize that I don't need to know the maximum value at all, so here is version 2

from itertools import count

def firstMissingSince(sequence, start=1):
    uniques = set(sequence) # { x for x in sequence if x>=start } 
    return next( x for x in count(start) if x not in uniques )

It would appear there is indeed a O(n) and constant space solution. It is quite elegant, which uses values as indexes to change the original list. — Rockybilly, Dec 13 '16 at 00:46

Nikhil · Answer 2 · 2021-01-02T03:27:10.470

Code

def first_missing_positive(nums):
    bit = 0
    for n in nums:
        if n > 0:
            bit |= 1 << (n - 1)
    flag = 0
    while bit != 0:
        if (bit & 1 == 0):
            break
        flag += 1
        bit >>= 1
    return flag + 1

Explanation

Usually, for constant space requirements, bitwise solutions are great.

The trick here is to pass through all the integers and store their binary representations in a single variable. Say "bit". e.g. when nums = [1, 2, 3] i.e. nums_bitwise = [1, 10, 11], bit = "11". 11 here represents the sequence [1, 10, 11] in a condensed form.

Now, Let's assume 2 was missing from nums. Then we have nums = [1, 3] i.e. nums_bitwise = [1, 11], bit = "101". We can now loop through all bits of the "bit" variable to find out the first missing positive integer 2 i.e. "0" in "101"

Note that, the value of bit will be 5 and 7 for nums=[1, 3] and nums=[1, 2, 3] respectively. You will need to do "{0:b}".format(bit) for its binary representaion.

The key line

bit |= 1 << (n - 1)

Stores all integers in nums by moving left, bit-by-bit, and ORing bits of integers with the default 0s of the bit variable.

Next, we do

if (bit & 1 == 0):
    break

To find the first zero in the condensed bit variable. First zero represents the first missing integer. Shift right by one bit bit >>= 1 and find if that bit is missing. If not, keep ANDing the last bit of bit variable with 1 until the result is 0.

Analysis

Since we look at every integer in nums only once, the time complexity is O(n). Assuming all the integers can be condensed in a single variable, the space complexity is O(1) i.e. constant space.

This is not O(1) space complexity since the size of the bit variable will grow as the size of the numbers in the list grows. This is O(p) where p is the biggest number in the list. If for example the number `2**35` was in the list, this algorithm would use `2**32` bytes of memory in the bit variable, which is a lot. It is interesting that your memory complexity is completely unrelated to the size of the input, I haven't encountered that before. — redfast00, Oct 02 '20 at 04:02

score 1 · Answer 3 · answered Feb 02 '22 at 17:30

1

Try this:

def firstMissingPositive(A):
        try:
            return min(set(range(1, len(A)+1)) - set(A))
        except:
            return max(1, max(A)+1)

answered Feb 02 '22 at 17:30

spideyonthego

123
1
10

This answer is something – Stephinext Sep 20 '22 at 22:06

score 0 · Answer 4 · edited May 23 '17 at 12:24

May not be the complete cause for long run times but I did find a bug that will cause an infinite loop. I started by creating random integer arrays of length 20.

a = [random.randint(-10, 20) for _ in range(20)]

Added two print statements to see what was happening.

    while i<ln:
        print(A)
        if A[i]>=1 and A[i]<=ln:
            if A[A[i]-1]!=m+1:
                print("Swapping %d"%i)
                A[A[i]-1], A[i] = m+1, A[A[i]-1]
            else:
       ...

With this array as input you get into an infinite loop:

a = [-6, 3, 10, 14, 17, 6, 14, 1, -5, -8, 8, 15, 17, -10, 2, 7, 11, 2, 7, 11]

>>>
...
[18, 18, -8, -10, -6, 6, 14, 18, -5, 18, 18, 15, 17, 18, 2, 7, 18, 18, 7, 11]
Swapping 5
[18, 18, -8, -10, -6, 6, 14, 18, -5, 18, 18, 15, 17, 18, 2, 7, 18, 18, 7, 11]
Swapping 5
[18, 18, -8, -10, -6, 6, 14, 18, -5, 18, 18, 15, 17, 18, 2, 7, 18, 18, 7, 11]
...

Turns out that if A[A[i]-1] equals A[i] then you end up always putting the same number back into A[i]. In this case i == 5, A[5] == 6 and A[A[i]-1] == 6. In this statement,

A[A[i]-1], A[i] = m+1, A[A[i]-1]

The right hand side is evaluated; m+1 is assigned to A[5]; then 6 is assigned to A[5]. I fixed it for this one case by swapping the assignment order:

A[i], A[A[i]-1] = A[A[i]-1], m+1

With the list you added to the question, it now throws an IndexError with my mod. Even though the right hand side gets evaluated first, it appears that A[A[i]-1] on the left hand side doesn't get evaluated till after the first assignment is made and a large number has been placed in A[i].

Plagiarizing Rob's solution - evaluate [A[i]-1 before any swaps are made:

def firstMissingPositive(A):
    m=max(A)
    ln=len(A)
    print('max:{}, len:{}'.format(m, ln))
    i=0
    while i<ln:
##        print(A[:20])
        if A[i]>=1 and A[i]<=ln:
            if A[A[i]-1]!=m+1:
##                print("Swapping %d"%i)
                v = A[i]-1
                A[i], A[v] = A[v], m+1
            else:
                i+=1
        else:
            i+=1
    for i in range(ln):
        if A[i]!=m+1:
            return i+1

And it still sometimes returns a wrong result so minus one for me. It produces wrong results for the following:

[18, 2, 13, 3, 3, 0, 14, 1, 18, 12, 6, -1, -3, 15, 11, 13, -8, 7, -8, -7]
[-6, 3, 10, 14, 17, 6, 14, 1, -5, -8, 8, 15, 17, -10, 2, 7, 11, 2, 7, 11]
[7, -7, 19, 6, -3, -6, 1, -8, -1, 19, -8, 2, 4, 19, 5, 6, 6, 18, 8, 17]

score 0 · Answer 5 · answered Dec 10 '16 at 03:52

0

FWIW, here is how I did it:

def firstMissingPositive(A):
    for i in range(len(A)):
        while A[i] != i+1 and 0 < A[i] < len(A):
            value = A[i]-1
            A[i], A[value] = A[value], A[i]
    for i, value in enumerate(A, 1):
        if i != value:
            return i
    return len(A)+1

assert firstMissingPositive([1,4,3,6,5]) == 2
assert firstMissingPositive([1,44,3,66,55]) == 2
assert firstMissingPositive([1,2,3,4,5]) == 6
assert firstMissingPositive(A) == 1

answered Dec 10 '16 at 03:52

Robᵩ

163,533
20
239
308

```value = A[i]-1``` definite improvement to make that evaluation separate from the assignment. I didn't realize (or never thought about) how those *multple* assignments work - it's pretty subtle. – wwii Dec 10 '16 at 04:18
```assert f([18, 2, 13, 3, 3, 0, 14, 1, 18, 12, 6, -1, -3, 15, 11, 13, -8, 7, -8, -7]) == 4```? – wwii Dec 10 '16 at 05:08
@wwii - yes, you are right. I falsely assumed that there would be no duplicates. – Robᵩ Dec 11 '16 at 11:52

score 0 · Answer 6 · edited Mar 07 '19 at 01:10

0

def firstMissingPositve(nums):
    if nums == []:
        return 1
    else:
        a = max(nums)
        for i in range(1 , a+2):
            if i not in nums:
                c = i
                return c

edited Mar 07 '19 at 01:10

BML91

2,952
3
32
54

answered Mar 07 '19 at 00:34

user11162722

1

2

Welcome to Stack Overflow, and thanks for answering this question! In order to maximize the usefulness of this answer, I would recommend editing to add some context about what your code does and why. – Pika Supports Ukraine Mar 07 '19 at 01:15

score 0 · Answer 7 · edited May 10 '20 at 13:28

0

def missingNumber(arr, n):
    x = {i for i in arr if i > 0}
    b = max(arr)
    for i in range(1, b + 2):
        if i not in x:
            return i

edited May 10 '20 at 13:28

Jasmijn

9,370
2
29
43

answered May 10 '20 at 10:02

tekuri purushotham

1

I don't see any need for the function argument n. – Mohit Pandey Feb 01 '21 at 21:57

score 0 · Answer 8 · answered Jun 22 '20 at 23:40

My python code below. O(n) time and O(1) space complexities. Inspired by @pmcarpan's high-level solution in this related question. Also check out the fully commented markdown version on my github.

def lowestMissingStrictlyPositiveInteger(arr):
    """ Return the lowest missing strictly 
    positive integer from the array arr. 
    Warning: In order to achieve this in linear time and
    constant space, arr is modified in-place.
    
    Uses separatePositiveIntegers() to isolate all
    strictly positive integers, and marks their occurrence
    with markIndicesOfObservedPositiveIntegers(). This 
    function then scans the modified array for the 'marks'
    and returns the first unmarked value. """
    m = separatePositiveIntegers(arr)
    markIndicesOfObservedPositiveIntegers(arr, m)
    for i in range(m): #O(m)
        if arr[i]>0:
            # this index hasn't been marked by
            # markIndexOfObservedPositiveIntegers(), 
            # therefore the integer i+1 is missing.
            return i+1
    return m+1

def separatePositiveIntegers(arr):
    """ Modify input array in place, so that 
    strictly positive integers are
    all at the start of the array, 
    and negative integers are
    all at the end of the array. 
    
    Return the index of the first negative 
    integer in the updated array (or len(arr)
    if all values are positive). """
    i1, i2 = 0, len(arr)-1
    while (i2 > i1): #O(n)
        
        if arr[i2]<=0:
            # move to the first strictly positive value
            # starting from the end of the array.
            i2 -= 1
            continue
        
        if arr[i1]>0:
            # move to the first negative value
            # from the start of the array.
            i1 += 1
            continue
        
        # swap negative value at i1 with the first
        # strictly positive value starting from the
        # end of the array (i.e., at position i2).
        tmp = arr[i2]
        arr[i2] = arr[i1]
        arr[i1] = tmp
    
    return i1 if arr[i1]<=0 else i1+1

def markIndicesOfObservedPositiveIntegers(arr, m):
    """ Take an array arr of integer values, 
    where indices [0,m-1] are all strictly positive
    and indices >= m are all negative
    (see separatePositiveIntegers() method).
    
    Mark the occurrence of a strictly positive integer
    k<=m by assigning a negative sign to the value in 
    the array at index k-1 (modify in place)."""
    for i in range(m): #O(m)
        # all values at indices [0,m-1] are strictly positive 
        # to start with, but may have been  modified in-place 
        # (switched to negative sign) in this loop. 
        # Therefore, read the untampered value as abs(arr[i]).
        untampered_val=abs(arr[i])
        # We can safely ignore any untampered value strictly superior to m
        # because it guarantees a gap in the integer sequence at a lower value 
        # (since arr only has m strictly positive integers).
        if untampered_val<=m:
            # mark the integer as "seen" by
            # changing the sign of the value at
            # index untampered_val-1 to negative.
            arr[untampered_val-1] = -abs(arr[untampered_val-1])

# test 1
arr = [3, 4, -1, 1]
assert lowestMissingStrictlyPositiveInteger(arr) == 2

# test 2
arr = [2, 0, 1]
assert lowestMissingStrictlyPositiveInteger(arr) == 3

# test 3
arr = [0]
assert lowestMissingStrictlyPositiveInteger(arr) == 1

score 0 · Answer 9 · answered Jan 14 '22 at 06:18

Here is a better answer don't mind long code doesn't metter we need quaity not short code

    def firstMissingPositive(self, nums):
        if 1 not in nums:
            return 1
        n = len(nums)
        for i in range(n):
            if nums[i] > n or nums[i] <= 0:
                nums[i] = 1
        for i in range(n):
            a = abs(nums[i])
            if a == n:
                nums[0] = -abs(nums[0])
            else:
                nums[a] = -abs(nums[a])
        for i in range(2, n):
            if nums[i] > 0: return i
        return n if nums[0] > 0 else n+1

score 0 · Answer 10 · answered Sep 01 '23 at 14:19

Obligatory one liner:

min(set(range(1, len(l) + 2)).difference(l))

Explanation: the answer is necessary below the length of the pool + 1. Calculate the set of all possible numbers, remove the impossible ones, and finally take the smallest one.

Complexity : linear in time and space.

Tests:

def first_missing_positive(l):
    return min(set(range(1, len(l) + 2)).difference(l))

assert first_missing_positive([]) == 1
assert first_missing_positive([0]) == 1
assert first_missing_positive([1]) == 2
assert first_missing_positive([0, 1]) == 2
assert first_missing_positive([2]) == 1
assert first_missing_positive([0, 2]) == 1
assert first_missing_positive([1, 2]) == 3
assert first_missing_positive([0, 1, 2]) == 3

Rockybilly · Answer 11 · 2016-12-13T00:30:56.863

-1

Here is my solution.

from collections import defaultdict

def firstMissingPositive(A):

    d = defaultdict(int)
    for i in A:
        d[i] = 1

    j = 1
    while True:
        if d[j] == 0:
            return j
        j += 1

Space complexity: O(n)

Time complexity: O(n + k). Which is considered O(n). Also ignored hashing complexity.

By the way: Googling gives the answer you seek, constant space and O(n) time.

edited Dec 13 '16 at 00:30

answered Dec 13 '16 at 00:19

Rockybilly

2,938
1
13
38

mmm, if you think about it, our solution are very similar, the difference is that I use a set and some extra check, but the core idea is the same... – Copperfield Dec 13 '16 at 01:23
also, you don't need defaultdict, you can use a regular dict with dict.fromkeys(A) for the same effect, only that the check is different – Copperfield Dec 13 '16 at 01:29
@Copperfield I used it to shorten code. Which I always try to achieve if possible :D – Rockybilly Dec 13 '16 at 01:41

score -1 · Answer 12 · answered May 26 '20 at 21:17

Complexity: O(N) or O(N * log(N))
If you want to test on cordiality you will get 100% score
The idea behind is to use 1 loop and once dictionary to avoid the O(N^2) complexity
Time taken for the list mentioned in the question: 0:00:00.000098

def test(A):
    value = 1
    data = {}
    for num in A:
        if num < 0:
            continue
        elif num > 0 and num != value:
            data[num] = 1
            continue
        elif num == value:
            data[num] = 1
            while data.get(value, None) is not None:
                value += 1
    return value

Find the first missing positive integer in Python

12 Answers12

Code

Explanation

Analysis