Situation I have a dataframe that is used as input for several functions each of which should return a copy of the input dataframe with the data modified according to the function.
Question How do I set up the functions so as to not modify the original dataframe (ie. the input dataframe) when running the functions?
Example
df_input = pd.DataFrame({'a': [1,2,3], 'b': [4,5,6]})
df_input
a b
0 1 4
1 2 5
2 3 6
def new_func(df):
df_out = df
df_out['new'] = 'C'
return df_out
df_output = new_func(df_input)
df_output
a b new
0 1 4 C
1 2 5 C
2 3 6 C
df_input
a b new
0 1 4 C
1 2 5 C
2 3 6 C
Desired state is to have only df_output
have the added column.
It's probably very straight-forward but any pointers or suggestions would be much appreciated!