1

I am trying to create a new column on an existing dataframe based on values of another dataframe.

# Define a dataframe containing 2 columns Date-Year and Date-Qtr
data1 = {'Date-Year': [2015, 2015, 2015, 2015, 2016, 2016, 2016, 2016, 2017, 2017], 
        'Date-Qtr': ['2015Q1', '2015Q2', '2015Q3', '2015Q4', '2016Q1', '2016Q2', '2016Q3', '2016Q4', '2017Q1', '2017Q2']} 

dfx = pd.DataFrame(data1)

# Define another dataframe containing 2 columns Date-Year and Interest Rate
data2 = {'Date-Year': [2000, 2015, 2016, 2017, 2018, 2019, 2020, 2021], 
        'Interest Rate': [0.00, 8.20, 8.20, 7.75, 7.50, 7.50, 6.50, 6.50]} 

dfy = pd.DataFrame(data2)

# Add 1 more column to the first dataframe
dfx['Int-rate'] = float(0)

Output for dfx

   Date-Year Date-Qtr  Int-rate
0       2015   2015Q1       0.0
1       2015   2015Q2       0.0
2       2015   2015Q3       0.0
3       2015   2015Q4       0.0
4       2016   2016Q1       0.0
5       2016   2016Q2       0.0
6       2016   2016Q3       0.0
7       2016   2016Q4       0.0
8       2017   2017Q1       0.0
9       2017   2017Q2       0.0

Output for dfy

Date-Year  Interest Rate
0       2000           0.00
1       2015           8.20
2       2016           8.20
3       2017           7.75
4       2018           7.50
5       2019           7.50
6       2020           6.50
7       2021           6.50

Now I need to update the column 'Int-rate' of dfx by picking up the value for 'Interest Rate' from dfy for its corresponding year which I am achieving through 2 FOR loops

#Check the year from dfx - goto dfy - check the interest rate from dfy for that year and modify Int-rate of dfx with this value
for i in range (len(dfx['Date-Year'])):
    for j in range (len(dfy['Date-Year'])):
        if (dfx['Date-Year'][i] == dfy['Date-Year'][j]):
            dfx['Int-rate'][i] = dfy['Interest Rate'][j] 

and I get the desired output

  Date-Year Date-Qtr  Int-rate
0       2015   2015Q1      8.20
1       2015   2015Q2      8.20
2       2015   2015Q3      8.20
3       2015   2015Q4      8.20
4       2016   2016Q1      8.20
5       2016   2016Q2      8.20
6       2016   2016Q3      8.20
7       2016   2016Q4      8.20
8       2017   2017Q1      7.75
9       2017   2017Q2      7.75

Is there a way I can achieve the same output

  1. without declaring dfx['Int-rate'] = float(0). I get a KeyError: 'Int-rate'if I don't declare this

  2. not very happy with the 2 FOR loops. Is it possible to get it done in a better way (like using map or merge or joins)

I have tried looking through other posts and the best one I found is here, tried using map but I could not do it. Any help will be appreciated

thanks

Mayank Porwal
  • 33,470
  • 8
  • 37
  • 58

2 Answers2

1

You could use replace with a dictionary:

dfx['Int-Rate'] = dfx['Date-Year'].replace(dict(dfy.to_numpy()))
print(dfx)

Output

   Date-Year Date-Qtr  Int-Rate
0       2015   2015Q1      8.20
1       2015   2015Q2      8.20
2       2015   2015Q3      8.20
3       2015   2015Q4      8.20
4       2016   2016Q1      8.20
5       2016   2016Q2      8.20
6       2016   2016Q3      8.20
7       2016   2016Q4      8.20
8       2017   2017Q1      7.75
9       2017   2017Q2      7.75

Or with a Series as an alternative:

dfx['Int-Rate'] = dfx['Date-Year'].replace(dfy.set_index('Date-Year').squeeze())
Dani Mesejo
  • 61,499
  • 6
  • 49
  • 76
1

You can simply use df.merge:

In [4448]: df = dfx.merge(dfy).rename(columns={'Interest Rate':'Int-rate'})

In [4449]: df
Out[4449]: 
   Date-Year Date-Qtr  Int-rate
0       2015   2015Q1      8.20
1       2015   2015Q2      8.20
2       2015   2015Q3      8.20
3       2015   2015Q4      8.20
4       2016   2016Q1      8.20
5       2016   2016Q2      8.20
6       2016   2016Q3      8.20
7       2016   2016Q4      8.20
8       2017   2017Q1      7.75
9       2017   2017Q2      7.75
Mayank Porwal
  • 33,470
  • 8
  • 37
  • 58