Say there's a dataframe:
import pandas as pd
df = pd.DataFrame([1,2,3,4,5, 7,8, 10])
I want to find the "missing" numbers in it (6 and 9). My code to do this is:
li = []
low = int(min(df.values))
high = int(max(df.values))
for i in range(low, high+1):
if i not in df.values:
li.append(i)
print(li)
>>> [6, 9]
But if the dataframe is huge, this may take some time with a for loop. In my case, with a dataframe of length ~300k rows, its taking 162 seconds.
Is there a more efficient (vectorized?) way to do this?