Is a nested for-loop necessary for this code, or is there a more efficient work-around?
This is a simplified version which searches for successive, overlapping intervals within a data set comprised of 20 random integers from 1 to 1000. It runs through error values of 1-100 to create the intervals by adding/subtracting them from the 20 random integers.
Example:
input assuming data frame of size 10 instead of 20:
df = [433, 3, 4, 5, 6, 7, 378, 87, 0, 500]
output for error = 1 in for-loop:
overlaps = {0:[[1, 2, 3, 4, 5]]}
def find_overlap(df, error):
"""
df: dataframe with random 20 integers from 1-1000
error: used to create the interval by +/- to each value in the dataframe
returns: list of list of indexes overlapping
"""
# add the interval to the dataframe as columns of minimum and maximum
df["min"] = df["x"] - error
df["max"] = df["x"] + error
# overlaps stores lists of indexes that overlap
overlaps = []
# fill in data for start
temporary = [0]
minimum = df["min"].iloc[0]
maximum = df["min"].iloc[0]
# iterates through the dataframe checking for overlap between successive intervals
for index , row in df.iterrows():
current_min = row["min"]
current_max = row["max"]
# yes overlap
if (current_min <= maximum) and (current_max >= minimum):
temporary.append(index)
if current_min > minimum:
minimum = current_min
if current_max < maximum:
maximum = current_max
continue
# no overlap - also check for 5 successive overlaps
if len(temporary) >= 5:
overlaps.append(temporary)
temporary = [index]
minimum = current_min
maximum = current_max
return overlaps
# creates dataframe with 20 random integers from 1 to 1000
df = pd.DataFrame(np.random.randint(1, 1000, 20), columns=["x"])
overlaps = {}
for error in range(0,100):
lst = find_overlap(df, error)
if len(lst):
overlaps[error] = lst
print(overlaps)