Here's my data set in a bank1.txt file
Keyword:Category
ccn:fintech
credit:fintech
smart:fintech
Here's my data set in a bank2.txt file
Keyword:Category
mcm:mcm
switching:switching
pul-sim:pulsa
transfer:transfer
debit sms:money transfer
What I want to do
Keyword Category_all
mcm mcm
switching switching
pul-sim pulsa
transfer transfer
debit sms money transfer
ccn fintech
credit fintech
smart fintech
What l did is
with open('entity_dict.txt') as f: //bank.txt
content = f.readlines()
content = [x.strip() for x in content ]
def ambil(inp):
try:
out = []
for x in content:
if x in inp:
out.append(x)
if len(out) == 0:
return 'other'
else:
output = ' '.join(out)
return output
except:
return 'other'
frame_institution['Keyword'] = frame_institution['description'].apply(ambil)
fintech = pd.read_csv('bank.txt', sep=":")
frame_Keyword = pd.merge(frame_institution, fintech, on='Keyword')
Then for bank2.txt code is
with open('entity_dict2.txt') as f:
content2 = f.readlines()
content2 = [x.strip() for x in content2 ]
def ambil2(inp):
try:
out = []
for x in content2:
if x in inp:
out.append(x)
if len(out) == 0:
return 'other'
else:
output = ' '.join(out)
return output
except:
return 'other'
frame_institution['Keyword2'] = frame_institution['description'].apply(ambil2)
fintech2 = pd.read_csv('bank2.txt', sep=":")
frame_Keyword2 = pd.merge(frame_institution, fintech, on='Keyword')
frame_Keyword2 = pd.merge(frame_Keyword2, fintech2, on='Keyword2')
Then l do filter for some keywords:
frame_Keyword2[frame_Keyword2['category_all'] == 'pulsa']
Actually result is:
Keyword Category_all
mcm mcm
switching switching
ccn fintech
credit fintech
smart fintech
But there is no 'pulsa'
, 'transfer'
, and 'money transfer'
appear in Category_all
. l think there is a better way to solve it.
`