I have two dataframes- df1 having columns like ISIN, Name, Weight and df2 having columns like Short Name, ISIN.
df1 =
ISIN Name Weight
Enbridge Inc 0.1
UDR Inc 1.1
Tyson Foods Inc 1.9
and df2=
Short Name ISIN
Enbridge Inc. bvefj154
UDR Group iuhb38g7
Tyson Foods Pvt Ltd. hruidf12
I have developed a fuzzy logic which will match Name
and Short Name
from df1 and df2.
so it using the logic it will know that Enbridge Inc from both dataframes are one and the same. and for UDR Group and UDR Inc they are also same since names are matching, not all but almost all.
I am looking for a way to populate the ISIN column
in df1 based on the logic that if the names match(Enbridge Inc matches in all) then select the ISIN for the respective Short Name
from df2 and add it to ISIN column in df1 wherever relevant name is present.
So the output that I look forward to would look like this: df1=
ISIN Name Weight
bvefj154 Enbridge Inc 0.1
iuhb38g7 UDR Inc 1.1
hruidf12 Tyson Foods Inc 1.9
using pandas's merge
function I tried achieving the task but got a an error like:
KeywordError:'Name'
here's the code for the same.
import pandas as pd
df1 = pd.merge(df1, df2, on=['Name', '% Weight'], how='right')
How can I do this? Please help.
EDIT: Here is the code for fuzzy logic matching using fuzzywuzzy
module
def fuzzy_merge(df_1, df_2, key1, key2, threshold=90, limit=1):
"""
:param df_1: the left table to join
:param df_2: the right table to join
:param key1: key column of the left table
:param key2: key column of the right table
:param threshold: how close the matches should be to return a match, based on Levenshtein distance
:param limit: the amount of matches that will get returned, these are sorted high to low
:return: dataframe with boths keys and matches
"""
s = df_2[key2].tolist()
m = df_1[key1].apply(lambda x: process.extract(x, s, limit=limit))
df_1['matches'] = m
m2 = df_1['matches'].apply(lambda x: ', '.join([i[0] for i in x if i[1] >= threshold]))
df_1['matches'] = m2
print(df_1)
df_1.to_csv('fuzzy-1390-match.csv')
#return df_1
fuzzy_merge(df1, df2, 'Name', 'Short Name', threshold=90)
Output:
5,Enbridge Inc Flt 07/15/80 Sr:20-A,0.0127,ENBRIDGE INC
6,Enbridge Inc. 6.25% 03/01/78,0.0122,ENBRIDGE INC
7,Emera 6.75% 6/15/76-26,0.0113,MERA
8,Scentre Group Trust 2 Flt 09/24/80 Sr:144A,0.011,SCENTRE GROUP
9,Credit Suisse Group AG 7.5 Perp,0.0106,
10,Aegon Funding Corp Ii 5.100% 12/15/49,0.0101,
11,Dte Energy Co 5.250% 12/01/77 Sr:E,0.01,DTE ENERGY CO
12,Dai-Ichi Life Insurance 4%,0.0099,
13,Southern Co Flt 09/15/51 Sr:21-A,0.0098,SOUTHERN CO
EDIT2:
This is the dataframe(df1 and df2): df1=
0 Transcanada Trust 5.875 08/15/76 0.0176
1 Bp Capital Markets Plc Flt Perp 0.0169
2 Transcanada Trust Flt 09/15/79 0.0169
3 Bp Capital Markets Plc Flt Perp 0.0155
4 Prudential Financial 5.375% 5/15/45 0.0150
5 Enbridge Inc Flt 07/15/80 Sr:20-A 0.0127
6 Enbridge Inc. 6.25% 03/01/78 0.0122
7 Emera 6.75% 6/15/76-26 0.0113
8 Scentre Group Trust 2 Flt 09/24/80 Sr:144A 0.0110
9 Credit Suisse Group AG 7.5 Perp 0.0106
10 Aegon Funding Corp Ii 5.100% 12/15/49 0.0101
11 Dte Energy Co 5.250% 12/01/77 Sr:E 0.0100
12 Dai-Ichi Life Insurance 4% 0.0099
13 Southern Co Flt 09/15/51 Sr:21-A 0.0098
14 Prudential Financial 5.625% 6/15/43 0.0097
15 Southern Co 4.950% 01/30/80 Sr:2020 0.0093
16 Scentre Group Trust 2 Flt 09/24/80 Sr:144A 0.0093
17 Metlife Inc 9.25% 4/8/2038 144A 0.0089
18 American Intl Group 8.175% 5/15/58 0.0086
19 Southern Co Flt 01/15/51 Sr:B 0.0079
df2=
Short Name ISIN
0 ABU DHABI COMMER AEA000201011
1 ABU DHABI NATION AEA002401015
2 ABU DHABI NATION AEA006101017
3 ADNOC DRILLING C AEA007301012
4 ALPHA DHABI HOLD AEA007601015
5 DUBAI ISLAMIC AED000201015
6 EMAAR PROP PJSC AEE000301011
7 ETISALAT AEE000401019
8 EMIRATES NBD PJS AEE000801010
9 INTL HOLDING CO AEI000201014
10 FIRST ABU DHABI AEN000101016
11 SCHLUMBERGER LTD AN8068571086
12 ERSTE GROUP BANK AT0000652011
13 OMV AG AT0000743059
14 VERBUND AG AT0000746409
15 ARISTOCRAT LEISU AU000000ALL7
16 AUST AND NZ BANK AU000000ANZ3
17 AFTERPAY LTD AU000000APT1
18 ASX LTD AU000000ASX7
19 BHP GROUP LTD AU000000BHP4