0

I have two dataframes representing two monthly datasets. I am trying to roll back the values in df1 (current month) with the values in df2 (previous month) if the value in fb is Y. The actual dataset is thousands of records long and there are 30 or so columns that need to be reset/rolled back. The double for loops in the code below return null values in df1.

import pandas as pd

df1 = pd.DataFrame({
    'loan_num': [100, 101, 102, 103, 104, 105],
    'bal': [100000, 100000, 150000, 225000, 350000, 500000],
    'rate': [2.25, 3.75, 4.00, 4.25, 6.125, 2.50],
    'sfee': [0.25, 0.19, 0.25, 0.44, 0.25, 0.19],
    'fb': ['Y', 'N', 'N', 'Y', 'N', 'Y']
})

df2 = pd.DataFrame({
    'loan_num': [100, 101, 102, 103, 104, 105],
    'bal': [125000, 150000, 200000, 275000, 355000, 550000],
    'rate': [2.25, 3.75, 4.00, 4.25, 6.125, 2.50],
    'sfee': [0.30, 0.22, 0.27, 0.75, 0.25, 0.55],
    'fb': ['Y', 'N', 'N', 'Y', 'N', 'Y']
})

fb_loans = df1.loc[df1['fb'] == 'Y', 'loan_num'].tolist()
cols_to_reset = ['bal', 'sfee']

for loan in fb_loans:
    for col in cols_to_reset:
        df1.loc[df1['loan_num'] == loan, col] = df2.loc[df2['loan_num'] == loan, col]

If I set the right side of the assignment statement to an fixed value, the output is correct. I've also tried the following:

for loan in fb_loans:
     df1.loc[df1['loan_num'] == loan, 'bal'] = df2.loc[df2['loan_num'] == loan, 'bal']
     df1.loc[df1['loan_num'] == loan, 'sfee'] = df2.loc[df2['loan_num'] == loan, 'sfee']
for loan in fb_loans:
    for col in cols_to_reset:
        new_value = df2.loc[df2['loan_num'] == loan, col]
        df1.loc[df1['loan_num'] == loan, col] = new_value

Both methods produced null values for the columns of interest in df1.

My expected output for df1 is:

+----------+----------+---------+---------+-----+
| loan_num | bal      | rate    | sfee    | fb  |
+----------+----------+---------+---------+-----+
| 100      | 125000   | 2.25    | 0.30    | Y   |
| 101      | 100000   | 3.75    | 0.19    | N   |
| 102      | 150000   | 4.00    | 0.25    | N   |
| 103      | 275000   | 4.25    | 0.75    | Y   |
| 104      | 350000   | 6.125   | 0.25    | N   |
| 105      | 550000   | 2.50    | 0.55    | Y   |
+----------+----------+---------+---------+-----+
petezurich
  • 9,280
  • 9
  • 43
  • 57
gernworm
  • 322
  • 3
  • 16

2 Answers2

1

You are looking at merge:

df1.merge(df2, on=['loan_num'], how='left')

which gives you (the _y columns coming from df2):

   loan_num   bal_x  rate_x  sfee_x fb_x   bal_y  rate_y  sfee_y fb_y
0       100  100000   2.250    0.25    Y  125000   2.250    0.30    Y
1       101  100000   3.750    0.19    N  150000   3.750    0.22    N
2       102  150000   4.000    0.25    N  200000   4.000    0.27    N
3       103  225000   4.250    0.44    Y  275000   4.250    0.75    Y
4       104  350000   6.125    0.25    N  355000   6.125    0.25    N
5       105  500000   2.500    0.19    Y  550000   2.500    0.55    Y

Since you are merging on one key column, you can use map:

df2 = df2.set_index('loan_num')
df1['bal'] = df1['loan_num'].map(df2['bal'])
df1['sfee'] = df1['loan_num'].map(df2['sfee'])
Quang Hoang
  • 146,074
  • 10
  • 56
  • 74
1

If I understood you right:

import pandas as pd
import numpy as np

df1 = pd.DataFrame({
    'loan_num': [100, 101, 102, 103, 104, 105],
    'bal': [100000, 100000, 150000, 225000, 350000, 500000],
    'rate': [2.25, 3.75, 4.00, 4.25, 6.125, 2.50],
    'sfee': [0.25, 0.19, 0.25, 0.44, 0.25, 0.19],
    'fb': ['Y', 'N', 'N', 'Y', 'N', 'Y']
})

df2 = pd.DataFrame({
    'loan_num': [100, 101, 102, 103, 104, 105],
    'bal': [125000, 150000, 200000, 275000, 355000, 550000],
    'rate': [2.25, 3.75, 4.00, 4.25, 6.125, 2.50],
    'sfee': [0.30, 0.22, 0.27, 0.75, 0.25, 0.55],
    'fb': ['Y', 'N', 'N', 'Y', 'N', 'Y']
})
new_df = df1.merge(df2, on='loan_num', how = 'right', suffixes=('', '_df2'))
new_df['bal'] = np.where(new_df['fb']=='Y', new_df['bal_df2'], new_df['bal'])
new_df['rate'] = np.where(new_df['fb']=='Y', new_df['rate_df2'], new_df['rate'])
new_df['sfee'] = np.where(new_df['fb']=='Y', new_df['sfee_df2'], new_df['sfee'])
new_df = new_df.iloc[:, [not o.endswith('_df2') for o in new_df.columns]]
new_df
gtomer
  • 5,643
  • 1
  • 10
  • 21