0

I have a long, ordered list of indices ranging from zero to approximately 4.3 million, with some indices missing from the range, i.e.

mylist = [0, 1, 5, 7, 8, 9, 12 ... 4301981, 4301983]

I am looking for a quick way to obtain an ordered list of the numbers which are absent from this one up to the maximum, i.e.

newlist = [2, 3, 4, 6, 10, 11 ... 4301982]

I have tried the following:

newlist = []
for i in range(max(mylist)):
    if i not in mylist:
        newlist.append(i)

but given the size of my list, this is far too slow. Is there a quick way to do this for a large list of indices like mine?

eyllanesc
  • 235,170
  • 19
  • 170
  • 241
  • 1
    [This post](https://stackoverflow.com/questions/57591210/how-to-diff-the-two-files-using-python-generator/57744476) solves a similar problem. The difference is that the input there is from a file and has a larger range. – GZ0 Sep 26 '19 at 14:40

2 Answers2

3

You could create a set from a range up to the highest value in the list, and take the set.difference with the list:

mylist = [0, 1, 5, 7, 8, 9, 12]

list(set(range(max(mylist))).difference(mylist))
# [2, 3, 4, 6, 10, 11]
yatu
  • 86,083
  • 12
  • 84
  • 139
0
def missing_indices(mylist):
    missing_list=[]
    for i in range(0,len(mylist)-1):
        if mylist[i:i+1]!=list(range(mylist[i],mylist[i+1])):
           missing_list.append(list(range(mylist[i]+1,mylist[i+1])))
    print(missing_list)
mylist=[0, 1, 5, 7, 8, 9, 12]
missing_indices(mylist)

[[2, 3, 4], [6], [10, 11]]
Joao Vitorino
  • 2,976
  • 3
  • 26
  • 55