0

I'm trying to avoid passing a recursive function an unchanging dataframe in every recursive call. I'm not actually certain this is an issue, though I could see it being so.

I've implemented a dummy version of my full recursive function (find_all_ns) which calls another function (find_ns) inside of it. The intended use case has find_ns operate by doing some comparisons with all the rows in a pandas dataframe, and hence find_ns requires the dataframe as an argument. I'm wondering if there is a way to avoid having to call my recursive function find_all_ns and pass it the dataframe every time - I'm not modifying the information in the dataframe at all. I'm not certain if passing the recursive function the dataframe is actually an actual issue, but I would assume so? I'll note that a global variable df and having find_ns(pt, data=df) preassigned would not work in the full case, because find_all_ns is meant to callable to opperate on different dataframes.

#Dummy version
data = [0, 1, 2, 3, 6] #Real version is a ~large dataframe

def find_all_ns(neighbors, prev_checked):
    to_check = neighbors - prev_checked
    if len(to_check) == 0:
        return neighbors
    else:
        new_ns = set()
        for pt in to_check:
            nns = find_ns(pt)
            new_ns = new_ns.union(nns)
            prev_checked.add(pt)
        neighs = neighbors.union(new_ns)
        return find_all_ns(neighs, prev_checked)

def find_ns(pt):
    ns = set()
    for other_pt in data: #real version need to pass in full dataframe
        if abs(other_pt - pt) <= 1:
            ns.add(other_pt)
    return ns

all_ns = find_all_ns({0}, set())
print(all_ns)



'''
find_neighbors is the full version of find_ns, and I will not be able to
change it to avoid requiring a dataframe (df). So it seems that I have to 
pass find_all_ns a dataframe argument repeatedly - and this seems like it 
could be a problem. 
'''

def find_neighbors(dist_metric, epsi, df, pt):
    neighborhood = set()
    my_pt = df.loc[pt,:] #pt is an index of a row
    for index, row in df.iterrows():
        dist = dist_metric(my_pt, row)
        neighborhood.add(index)
    return neighborhood
Evan Mata
  • 500
  • 1
  • 6
  • 19
  • 1
    Are you under the impression that parameter passing involves a data copy? It doesn't. It may help to read up on [how objects and variables interact in Python](https://nedbatchelder.com/text/names.html). – user2357112 Dec 31 '18 at 18:33
  • So passing the dataframe object would just be passing a pointer repeatedly and hence not really an issue? – Evan Mata Dec 31 '18 at 18:47
  • Does this answer your question? [How do I pass a variable by reference?](https://stackoverflow.com/questions/986006/how-do-i-pass-a-variable-by-reference) – ggorlen Aug 15 '20 at 02:34
  • Yes, passing any object into a function only uses the space necessary to store the new variable, which is virtually none. These variables all point to the same object as they would in assignment. – ggorlen Aug 15 '20 at 02:36

0 Answers0