In Python3, you could use the nonlocal keyword:
def outer_method():
... do outer scope stuff here
df = pd.DataFrame(columns=['A','B','C','D'])
def recursive_method(arg):
nonlocal df
... do local stuff here
# func returns a data frame to be appended to empty data frame
results_df = func(args)
df = df.append(results_df, ignore_index=True)
return results
return df
But note that calling df.append
returns a new DataFrame each time and thus requires copying all the old data into the new DataFrame. If you do this inside a loop N times, you end up making on the order of 1+2+3+...+N = O(N^2) copies -- very bad for performance.
If you do not need df
inside recursive_method
for any purpose other than
appending, it is better to append to a list, and then construct the
DataFrame (by calling pd.concat
once) after recursive_method
is done:
df = pd.DataFrame(columns=['A','B','C','D'])
data = [df]
def recursive_method(arg, data):
... do stuff here
# func returns a data frame to be appended to empty data frame
results_df = func(args)
data.append(df_join_out)
return results
recursive_method(arg, data)
df = pd.concat(data, ignore_index=True)
This is the best solution if all you need to do is collect data inside
recursive_method
and can wait to construct the new df
after
recursive_method
is done.
In Python2, if you must use df
inside recursive_method
, then you could pass
df
as argument to recursive_method
, and return df
too:
df = pd.DataFrame(columns=['A','B','C','D'])
def recursive_method(arg, df):
... do stuff here
results, df = recursive_method(arg, df)
# func returns a data frame to be appended to empty data frame
results_df = func(args)
df = df.append(results_df, ignore_index=True)
return results, df
results, df = recursive_method(arg, df)
but be aware that you will be paying a heavy price doing the O(N^2) copying
mentioned above.
Why DataFrames can not should not be appended to in-place:
The underlying data in a DataFrame is stored in NumPy arrays. The data in a
NumPy array comes from a contiguous block of memory. Sometimes there is not
enough space to resize the NumPy arrays to a larger contigous block of memory
even if memory is available -- imagine the array being sandwiched in between
other data structures. In that case, in order to resize the array, a new larger
block of memory has to be allocated somewhere else and all the data from the
original array has to be copied to the new block. In general, it can't be done
in-place.
DataFrames
do have a private method, _update_inplace
, which could be
used to redirect a DataFrame's underlying data to new data. This is only a
pseudo-inplace operation, since the new data (think NumPy arrays) has to be
allocated (with all the attendant copying) first. So using _update_inplace
has
two strikes against it: it uses a private method which (in theory) may not be
around in future versions of Pandas, and it incurs the O(N^2) copying penalty.
In [231]: df = pd.DataFrame([[0,1,2]])
In [232]: df
Out[232]:
0 1 2
0 0 1 2
In [233]: df._update_inplace(df.append([[3,4,5]]))
In [234]: df
Out[234]:
0 1 2
0 0 1 2
0 3 4 5