2

I'd like the dataframe passed into this function to be modified.

def func(df):
    left_df = pd.DataFrame([[1, 2], [3, 4]], columns=['A', 'B'])
    right_df = pd.DataFrame([[5, 6], [7, 8]], columns=['C', 'D'])
    df = pd.merge(left_df, right_df, how='outer', left_index=True, right_index=True)
    print("df is now a merged dataframe!")

test = pd.DataFrame()
func(test)

However, since Python passes by value, the callee func() gets a copy of df which points to the original empty dataframe. When it is assigned to the merged dataframe, it creates a new object returned by pd.merge() and points df to this new object. However, test is unchanged and continues pointing to the original empty dataframe.

How can we merge inplace in func() so test is actually changed? I'd like something like pandas.DataFrame.update(), but this only lets you do left joins.

haudarren
  • 425
  • 1
  • 4
  • 12

2 Answers2

1

IIUC, something like this?

def func(df):
    left_df = pd.DataFrame([[1, 2], [3, 4]], columns=['A', 'B'])
    right_df = pd.DataFrame([[5, 6], [7, 8]], columns=['C', 'D'])
    df = pd.merge(left_df, right_df, how='outer', left_index=True, right_index=True)
    print("df is now a merged dataframe!")
    global test 
    test = df

test = pd.DataFrame()
func(test)
print(test)

Output:

df is now a merged dataframe!
   A  B  C  D
0  1  2  5  6
1  3  4  7  8
Scott Boston
  • 147,308
  • 15
  • 139
  • 187
  • This definitely works! According to [this post](https://stackoverflow.com/questions/38895768/python-pandas-dataframe-is-it-pass-by-value-or-pass-by-reference), using global variables can make it difficult to track where changes occurred, but this is the best solution I've seen. – haudarren Oct 14 '17 at 00:48
1

Python does not pass by value!
NOTE: This is bad coding practice in general
PROOF

test = pd.DataFrame([[1, 2], [3, 4]])

def func(df):
    df.loc[:] = df * 2

print(test)
func(test)
print(test)

   0  1
0  1  2
1  3  4

   0  1
0  2  4
1  6  8

Your issue is that you are naming a local version of the name df. You need to alter the dataframe inplace somehow.

test = pd.DataFrame([[1, 2], [3, 4]], columns=list('AB'))

def func(df):
    df.loc[:, 'C'] = 9

print(test)
func(test)
print(test)

   A  B
0  1  2
1  3  4

   A  B  C
0  1  2  9
1  3  4  9
piRSquared
  • 285,575
  • 57
  • 475
  • 624