I'm trying to avoid passing a recursive function an unchanging dataframe in every recursive call. I'm not actually certain this is an issue, though I could see it being so.
I've implemented a dummy version of my full recursive function (find_all_ns) which calls another function (find_ns) inside of it. The intended use case has find_ns operate by doing some comparisons with all the rows in a pandas dataframe, and hence find_ns requires the dataframe as an argument. I'm wondering if there is a way to avoid having to call my recursive function find_all_ns and pass it the dataframe every time - I'm not modifying the information in the dataframe at all. I'm not certain if passing the recursive function the dataframe is actually an actual issue, but I would assume so? I'll note that a global variable df and having find_ns(pt, data=df) preassigned would not work in the full case, because find_all_ns is meant to callable to opperate on different dataframes.
#Dummy version
data = [0, 1, 2, 3, 6] #Real version is a ~large dataframe
def find_all_ns(neighbors, prev_checked):
to_check = neighbors - prev_checked
if len(to_check) == 0:
return neighbors
else:
new_ns = set()
for pt in to_check:
nns = find_ns(pt)
new_ns = new_ns.union(nns)
prev_checked.add(pt)
neighs = neighbors.union(new_ns)
return find_all_ns(neighs, prev_checked)
def find_ns(pt):
ns = set()
for other_pt in data: #real version need to pass in full dataframe
if abs(other_pt - pt) <= 1:
ns.add(other_pt)
return ns
all_ns = find_all_ns({0}, set())
print(all_ns)
'''
find_neighbors is the full version of find_ns, and I will not be able to
change it to avoid requiring a dataframe (df). So it seems that I have to
pass find_all_ns a dataframe argument repeatedly - and this seems like it
could be a problem.
'''
def find_neighbors(dist_metric, epsi, df, pt):
neighborhood = set()
my_pt = df.loc[pt,:] #pt is an index of a row
for index, row in df.iterrows():
dist = dist_metric(my_pt, row)
neighborhood.add(index)
return neighborhood