31

Suppose we have two items missing in a sequence of consecutive integers and the missing elements lie between the first and last elements. I did write a code that does accomplish the task. However, I wanted to make it efficient using less loops if possible. Any help will be appreciated. Also what about the condition when we have to find more missing items (say close to n/4) instead of 2. I think then my code should be efficient right because I am breaking out from the loop earlier?

def missing_elements(L,start,end,missing_num):
    complete_list = range(start,end+1)
    count = 0
    input_index = 0
    for item  in  complete_list:
        if item != L[input_index]:
            print item
            count += 1
        else :
            input_index += 1
        if count > missing_num:
            break



def main():
    L = [10,11,13,14,15,16,17,18,20]
    start = 10
    end = 20
    missing_elements(L,start,end,2)



if __name__ == "__main__":
    main()
Mehdi Nellen
  • 8,486
  • 4
  • 33
  • 48
vkaul11
  • 4,098
  • 12
  • 47
  • 79

17 Answers17

67

If the input sequence is sorted, you could use sets here. Take the start and end values from the input list:

def missing_elements(L):
    start, end = L[0], L[-1]
    return sorted(set(range(start, end + 1)).difference(L))

This assumes Python 3; for Python 2, use xrange() to avoid building a list first.

The sorted() call is optional; without it a set() is returned of the missing values, with it you get a sorted list.

Demo:

>>> L = [10,11,13,14,15,16,17,18,20]
>>> missing_elements(L)
[12, 19]

Another approach is by detecting gaps between subsequent numbers; using an older itertools library sliding window recipe:

from itertools import islice, chain

def window(seq, n=2):
    "Returns a sliding window (of width n) over data from the iterable"
    "   s -> (s0,s1,...s[n-1]), (s1,s2,...,sn), ...                   "
    it = iter(seq)
    result = tuple(islice(it, n))
    if len(result) == n:
        yield result    
    for elem in it:
        result = result[1:] + (elem,)
        yield result

def missing_elements(L):
    missing = chain.from_iterable(range(x + 1, y) for x, y in window(L) if (y - x) > 1)
    return list(missing)

This is a pure O(n) operation, and if you know the number of missing items, you can make sure it only produces those and then stops:

def missing_elements(L, count):
    missing = chain.from_iterable(range(x + 1, y) for x, y in window(L) if (y - x) > 1)
    return list(islice(missing, 0, count))

This will handle larger gaps too; if you are missing 2 items at 11 and 12, it'll still work:

>>> missing_elements([10, 13, 14, 15], 2)
[11, 12]

and the above sample only had to iterate over [10, 13] to figure this out.

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • 2
    I'm quite sure the input list is already sorted, as he named that a sequence. From what I got from the OP's question, he's trying to make it less than `O(n)`. – Rubens Jun 06 '13 at 23:53
  • 1
    @Rubens: The output of the difference between sets is a set, which does not have a defined order. Returning `sorted()` on the set returns a sorted list, however. – Martijn Pieters Jun 06 '13 at 23:54
  • 1
    @Rubens: and I am skeptical that the OP is aware enough of big-Oh notation to ask for better-than-O(n) unless I see that explicitly stated. – Martijn Pieters Jun 06 '13 at 23:57
  • Yes Rubens, I am trying to get more efficient algorithm. Martijn, so my method in terms of complexity is as efficient as you can get or you think there is some trick that help make it more efficient? Does not your method have an additional penalty if I had to find 5 missing items instead of 2 because I can sometime break from the loop earlier and I don't need sorting in my code. I know if we have one number missing difference in sums can help. – vkaul11 Jun 07 '13 at 00:00
  • @vkaul11: Your statements are contradicting a little; you know beforehand how many items are missing? So if 4 are missing, you only have to find those 4 and then you are done? – Martijn Pieters Jun 07 '13 at 00:05
  • Martijn, I am aware of the big-Oh notation and I wanted to know if the code can be faster/more efficient than the code I wrote. Obviously it can be more compact (like squiguy has written) but I wanted something faster or equally fast and compact. – vkaul11 Jun 07 '13 at 00:07
  • @Martijn, yes I know before hand how many items are missing, that is right. How expensive is the operation of converting the list to a set? The total time in your algorithm is the operation of converting the list to a set (O(n)?), taking a difference of two sets (which I guess is O(n)?) and then doing the final sort O(logn). Is that right? What exactly are we gaining by using a set here compared to my code? – vkaul11 Jun 07 '13 at 00:14
  • 1
    @vkaul11: Updated to give you a O(n) algorithm, with an option to bail out early if the number of missing elements is known up front. – Martijn Pieters Jun 07 '13 at 00:16
  • @vkaul11: Turning a sequence into a set is a O(n) operation. – Martijn Pieters Jun 07 '13 at 00:18
  • @Martijin: Thanks I changed my code also to take number of missing values? Isn't my code as efficient as yours? – vkaul11 Jun 07 '13 at 00:22
  • @vkaul11: Both are O(n), provided you are using Python 3, but I think mine beats yours on constant costs. We'd have to use the `timeit` module to measure actual performance here, with a spread of input scenarios. – Martijn Pieters Jun 07 '13 at 00:27
  • You mean the algorithm u gave that does not use sets? In Python2.7 which one will be better, mine or yours? I learned a lot from this discussion and thanks Martijin. – vkaul11 Jun 07 '13 at 00:45
  • @vkaul11: In 2.7 `range()` materializes a full list, so that is a loop right there. Use `xrange()` to just create a range object (like python 3 `range()` would do) instead. – Martijn Pieters Jun 07 '13 at 06:47
12

Assuming that L is a list of integers with no duplicates, you can infer that the part of the list between start and index is completely consecutive if and only if L[index] == L[start] + (index - start) and similarly with index and end is completely consecutive if and only if L[index] == L[end] - (end - index). This combined with splitting the list into two recursively gives a sublinear solution.

# python 3.3 and up, in older versions, replace "yield from" with yield loop

def missing_elements(L, start, end):
    if end - start <= 1: 
        if L[end] - L[start] > 1:
            yield from range(L[start] + 1, L[end])
        return

    index = start + (end - start) // 2

    # is the lower half consecutive?
    consecutive_low =  L[index] == L[start] + (index - start)
    if not consecutive_low:
        yield from missing_elements(L, start, index)

    # is the upper part consecutive?
    consecutive_high =  L[index] == L[end] - (end - index)
    if not consecutive_high:
        yield from missing_elements(L, index, end)

def main():
    L = [10,11,13,14,15,16,17,18,20]
    print(list(missing_elements(L,0,len(L)-1)))
    L = range(10, 21)
    print(list(missing_elements(L,0,len(L)-1)))

main()
Lie Ryan
  • 62,238
  • 13
  • 100
  • 144
  • Thanks @Lie Ryan. I will try to look at the yield from function that I am not familiar with. – vkaul11 Jun 07 '13 at 02:00
  • 3
    I love the smell of recursion in the morning – Edgar H Mar 06 '20 at 14:29
  • How would this be faster than just checking each item in a simple for loop? You still have to check every adjacent item, you're just doing it with a bunch of recursion thrown in there, which would slow it down further. Unless I'm missing something. – rcronk Mar 03 '21 at 00:17
  • @rcronk "still have to check every adjacent item" Nope, that's incorrect, with the recursive solution, only ranges that have holes need to be checked, that's why this is sublinear. If there's `k` holes, my recursive solution will find that hole `O(k*log(n)) where k <= n`, if `k` is 1 or if all the holes are grouped together in one chunk, then this will find the hole in `O(log(n))`. Or to simplify, it's `O(log(n))`. The best case scenario is when there are no holes, where it will verify that in `O(1)` because it only need to check exactly two numbers. – Lie Ryan Mar 03 '21 at 00:52
  • @rcronk Now, if you're trying to implement this with in real life machines, some complications do arise, real life machines like linear loops as they're easy for the branch predictor, memory prefetcher likes memory locality, and real life machines also can do parallel processing. So a real life implementation will likely use a hybrid approach where it starts with this algorithm, queue up ranges for parallel processing pool, and then switch to linear search when the ranges have been divided small enough. But in the pure algorithmic world, you don't need to do any of that for optimal solution. – Lie Ryan Mar 03 '21 at 00:57
  • @rcronk: in real life solution, you'll also have to consider that Python loops and function calls are gosh darn expensive. So an asymptotically worst solution like Martijn Pieters' solution might actually run much faster because it uses C loops. My solution is what your algorithms professor expects in algorithm classes, Martijn's solution is what you should actually implement if your company have this problem to solve, and they need the report yesterday. – Lie Ryan Mar 03 '21 at 01:12
  • @LieRyan - But how do you know a range has holes without checking it first? If there's a list of 10 items with no holes in it, which two items do you check to know there are no holes in it in O(1) time? I'm definitely missing something here. – rcronk Mar 09 '21 at 19:07
  • @rcronk: If the list is sorted and contains no duplicates, that means if `L[start] == L[end] - (end - start)`, then it must contain all consecutive numbers. You only need to check `L[start]` and `L[end]`. If the list contains any holes, then `end - start > L[end] - L[start]`. – Lie Ryan Mar 15 '21 at 22:41
  • @LieRyan - I apologize. I completely missed the part where the list was sorted. That makes complete sense. Thanks. – rcronk Mar 17 '21 at 04:24
3
missingItems = [x for x in complete_list if not x in L]
squiguy
  • 32,370
  • 6
  • 56
  • 63
3

 a=[1,2,3,7,5,11,20]
 b=[]
 def miss(a,b):
     for x in range (a[0],a[-1]):
        if x not in a:
            b.append(x)
     return b
 print (miss(a,b))

ANS:[4, 6, 8, 9, 10, 12, 13, 14, 15, 16, 17, 18, 19]

works for sorted,unsorted , with duplicates too

Avinash Dalvi
  • 8,551
  • 7
  • 27
  • 53
  • The range to be `range(a[0],a[-1]+1)` if you want it to include the last element of your initial list of numbers. But overall I like this solution a lot, seems simple compared to some of the other solutions. – Brad Root Nov 16 '20 at 22:38
2

Using collections.Counter:

from collections import Counter

dic = Counter([10, 11, 13, 14, 15, 16, 17, 18, 20])
print([i for i in range(10, 20) if dic[i] == 0])

Output:

[12, 19]
aldeb
  • 6,588
  • 5
  • 25
  • 48
2
arr = [1, 2, 5, 6, 10, 12]
diff = []

"""zip will return array of tuples (1, 2) (2, 5) (5, 6) (6, 10) (10, 12) """
for a, b in zip(arr , arr[1:]):
    if a + 1 != b:
        diff.extend(range(a+1, b))

print(diff)

[3, 4, 7, 8, 9, 11]

If the list is sorted we can lookup for any gap. Then generate a range object between current (+1) and next value (not inclusive) and extend it to the list of differences.

Argentum
  • 73
  • 6
1

Using scipy lib:

import math
from scipy.optimize import fsolve

def mullist(a):
    mul = 1
    for i in a:
        mul = mul*i
    return mul

a = [1,2,3,4,5,6,9,10]
s = sum(a)
so = sum(range(1,11))
mulo = mullist(range(1,11))
mul = mullist(a)
over = mulo/mul
delta = so -s
# y = so - s -x
# xy = mulo/mul
def func(x):
    return (so -s -x)*x-over

print int(round(fsolve(func, 0))), int(round(delta - fsolve(func, 0)))

Timing it:

$ python -mtimeit -s "$(cat with_scipy.py)" 

7 8

100000000 loops, best of 3: 0.0181 usec per loop

Other option is:

>>> from sets import Set
>>> a = Set(range(1,11))
>>> b = Set([1,2,3,4,5,6,9,10])
>>> a-b
Set([8, 7])

And the timing is:

Set([8, 7])
100000000 loops, best of 3: 0.0178 usec per loop
0x90
  • 39,472
  • 36
  • 165
  • 245
0

Here's a one-liner:

In [10]: l = [10,11,13,14,15,16,17,18,20]

In [11]: [i for i, (n1, n2) in enumerate(zip(l[:-1], l[1:])) if n1 + 1 != n2]
Out[11]: [1, 7]

I use the list, slicing to offset the copies by one, and use enumerate to get the indices of the missing item.

For long lists, this isn't great because it's not O(log(n)), but I think it should be pretty efficient versus using a set for small inputs. izip from itertools would probably make it quicker still.

acjay
  • 34,571
  • 6
  • 57
  • 100
  • Is that not readable? It's all pretty basic builtin Python. In code I'd check in, I'd probably choose better identifier names, but then again, maybe not if it's all self-contained in one line. – acjay Jun 06 '13 at 23:54
  • I would just do it over a few lines, but that's just me :). – squiguy Jun 06 '13 at 23:54
  • I agree, if I'm spilling over 80 chars, I usually break on `in` and `if` in my list comprehensions. – acjay Jun 06 '13 at 23:56
  • this may produce an unexpected result if the missing numbers are consecutive. A second pass is needed to check for that. PS: using basic features of the language isn't the same as good readability. – Lie Ryan Jun 07 '13 at 00:04
  • @LieRyan Good call on the consecutive numbers. I really don't think the way I wrote it is so esoteric. If you understand enumerate and zip, it pretty much reads from left-to-right. Your solution (while awesome, sublinear, and more correct) is not really more readable in my opinion – acjay Jun 07 '13 at 02:33
  • @acjohnson: to clarify, I don't find your code particularly unreadable; I'm simply correcting the incorrect assertion that "it's all pretty basic built-in Python" means that it's readable. – Lie Ryan Jun 07 '13 at 17:15
0

My take was to use no loops and set operations:

def find_missing(in_list):
    complete_set = set(range(in_list[0], in_list[-1] + 1))
    return complete_set - set(in_list)

def main():
    sample = [10, 11, 13, 14, 15, 16, 17, 18, 20]
    print find_missing(sample)

if __name__ == "__main__":
    main()

# => set([19, 12])
mVChr
  • 49,587
  • 11
  • 107
  • 104
  • 1
    To build the sets, and to calculate the difference, Python still has to employ loops. These are just hidden in C code. They'll be faster than Python loops, and the set difference is an efficient operation, but the loops are still there. – Martijn Pieters Jun 06 '13 at 23:56
  • @MartijnPieters True enough, and I see you beat me to it anyway. Thanks for elaborating. – mVChr Jun 06 '13 at 23:58
0

Simply walk the list and look for non-consecutive numbers:

prev = L[0]
for this in L[1:]:
    if this > prev+1:
        for item in range(prev+1, this):    # this handles gaps of 1 or more
            print item
    prev = this
Brent Washburne
  • 12,904
  • 4
  • 60
  • 82
  • That's how I'd do it in a language without some of Python's syntactic power, but that would be a fairly plodding way to do it in Python. – acjay Jun 07 '13 at 00:01
  • Agreed. I like your one-liner better. – Brent Washburne Jun 07 '13 at 00:05
  • OTOH, this code is about the easiest to read, and it doesn't require any other libraries, sets, or syntactic tools. It would be trivial to add a counter to break out of the loop. – Brent Washburne Jun 07 '13 at 06:22
0

We found a missing value if the difference between two consecutive numbers is greater than 1:

>>> L = [10,11,13,14,15,16,17,18,20]
>>> [x + 1 for x, y in zip(L[:-1], L[1:]) if y - x > 1]
[12, 19]

Note: Python 3. In Python 2 use itertools.izip.

Improved version for more than one value missing in a row:

>>> import itertools as it
>>> L = [10,11,14,15,16,17,18,20] # 12, 13 and 19 missing
>>> [x + diff for x, y in zip(it.islice(L, None, len(L) - 1),
                              it.islice(L, 1, None)) 
     for diff in range(1, y - x) if diff]
[12, 13, 19]
Mike Müller
  • 82,630
  • 20
  • 166
  • 161
0
>>> l = [10,11,13,14,15,16,17,18,20]
>>> [l[i]+1 for i, j in enumerate(l) if (l+[0])[i+1] - l[i] > 1]
[12, 19]
dansalmo
  • 11,506
  • 5
  • 58
  • 53
0
def missing_elements(inlist):
    if len(inlist) <= 1:
        return []
    else:
        if inlist[1]-inlist[0] > 1:
            return [inlist[0]+1] + missing_elements([inlist[0]+1] + inlist[1:])
        else:
            return missing_elements(inlist[1:])
englealuze
  • 1,445
  • 12
  • 19
0

First we should sort the list and then we check for each element, except the last one, if the next value is in the list. Be carefull not to have duplicates in the list!

l.sort()

[l[i]+1 for i in range(len(l)-1) if l[i]+1 not in l]
Victor Stanescu
  • 451
  • 3
  • 9
  • 2
    Code-only answers are often unhelpful in pointing to *why* the issue happened. You should include an explanation why it solves the issue. Please read [How do I write a good answer?](https://stackoverflow.com/help/how-to-answer) – FluffyKitten Aug 30 '17 at 12:11
0

With this code you can find any missing values in a sequence, except the last number. It in only required to input your data into excel file with column name "numbers".

import pandas as pd
import numpy as np

data = pd.read_excel("numbers.xlsx")

data_sort=data.sort_values('numbers',ascending=True)
index=list(range(len(data_sort)))
data_sort['index']=index
data_sort['index']=data_sort['index']+1
missing=[]

for i in range (len(data_sort)-1):
    if data_sort['numbers'].iloc[i+1]-data_sort['numbers'].iloc[i]>1:
        gap=data_sort['numbers'].iloc[i+1]-data_sort['numbers'].iloc[i]
        numerator=1
        for j in range (1,gap):          
            mis_value=data_sort['numbers'].iloc[i+1]-numerator
            missing.append(mis_value)
            numerator=numerator+1
print(np.sort(missing))
  • The question is asking for a generic algorithm. Your answer is too implementation dependent. – Udo E. Aug 19 '19 at 02:41
0

I stumbled on this looking for a different kind of efficiency -- given a list of unique serial numbers, possibly very sparse, yield the next available serial number, without creating the entire set in memory. (Think of an inventory where items come and go frequently, but some are long-lived.)

def get_serial(string_ids, longtail=False):
  int_list = map(int, string_ids)
  int_list.sort()
  n = len(int_list)
  for i in range(0, n-1):
    nextserial = int_list[i]+1
    while nextserial < int_list[i+1]:
      yield nextserial
      nextserial+=1
  while longtail:
    nextserial+=1
    yield nextserial
[...]
def main():
  [...]
  serialgenerator = get_serial(list1, longtail=True)
  while somecondition:
    newserial = next(serialgenerator)

(Input is a list of string representations of integers, yield is an integer, so not completely generic code. longtail provides extrapolation if we run out of range.)

There's also an answer to a similar question which suggests using a bitarray for efficiently handling a large sequence of integers.

Some versions of my code used functions from itertools but I ended up abandoning that approach.

arp
  • 299
  • 3
  • 7
0

A bit of mathematics and we get a simple solution. The below solution works for integers from m to n. Works for both sorted and unsorted postive and negative numbers.

#numbers = [-1,-2,0,1,2,3,5]
numbers = [-2,0,1,2,5,-1,3]

sum_of_nums =  0
max = numbers[0]
min = numbers[0]
for i in numbers:
    if i > max:
        max = i
    if i < min:
        min = i
    sum_of_nums += i

# Total : sum of numbers from m to n    
total = ((max - min + 1) * (max + min)) / 2

# Subtract total with sum of numbers which will give the missing value
print total - sum_of_nums
Chetan
  • 1,217
  • 2
  • 13
  • 27