0

Here's my data set in a bank1.txt file

Keyword:Category
ccn:fintech
credit:fintech
smart:fintech

Here's my data set in a bank2.txt file

Keyword:Category
mcm:mcm
switching:switching
pul-sim:pulsa
transfer:transfer
debit sms:money transfer

What I want to do

 Keyword     Category_all
 mcm           mcm
 switching     switching
 pul-sim       pulsa
 transfer      transfer
 debit sms     money transfer
 ccn           fintech
 credit        fintech
 smart         fintech

What l did is

with open('entity_dict.txt') as f:  //bank.txt
    content = f.readlines() 
    content = [x.strip() for x in content ]

def ambil(inp):
    try:
        out = []
        for x in content:      
            if x in inp:
                out.append(x)

        if len(out) == 0:
            return 'other'
        else:
            output = ' '.join(out)
            return output

    except:
        return 'other'

frame_institution['Keyword'] = frame_institution['description'].apply(ambil)
fintech = pd.read_csv('bank.txt', sep=":")
frame_Keyword = pd.merge(frame_institution, fintech, on='Keyword')

Then for bank2.txt code is

with open('entity_dict2.txt') as f: 
    content2 = f.readlines()
    content2 = [x.strip() for x in content2 ]

def ambil2(inp):
    try:
        out = []
        for x in content2:      
            if x in inp:
                out.append(x)

        if len(out) == 0:
            return 'other'
        else:
            output = ' '.join(out)
            return output
    except:
        return 'other'

frame_institution['Keyword2'] =   frame_institution['description'].apply(ambil2) 
fintech2 = pd.read_csv('bank2.txt', sep=":")
frame_Keyword2 = pd.merge(frame_institution, fintech, on='Keyword')
frame_Keyword2 = pd.merge(frame_Keyword2, fintech2, on='Keyword2')

Then l do filter for some keywords:

frame_Keyword2[frame_Keyword2['category_all'] == 'pulsa'] 

Actually result is:

Keyword     Category_all
 mcm           mcm
 switching     switching
 ccn           fintech
 credit        fintech
 smart         fintech

But there is no 'pulsa', 'transfer', and 'money transfer' appear in Category_all. l think there is a better way to solve it.

`

stovfl
  • 14,998
  • 7
  • 24
  • 51

1 Answers1

2

Simply try with merge:

DataFrame 1:

>>> df1
  Keyword Category
0     ccn  fintech
1  credit  fintech
2   smart  fintech

DataFrame 2:

>>> df2
     Keyword        Category
0        mcm             mcm
1  switching       switching
2    pul-sim           pulsa
3   transfer        transfer
4  debit sms  money transfer

Result , merge outer...

>>> pd.merge(df1, df2, how='outer')
     Keyword        Category
0        ccn         fintech
1     credit         fintech
2      smart         fintech
3        mcm             mcm
4  switching       switching
5    pul-sim           pulsa
6   transfer        transfer
7  debit sms  money transfer

Another Solutions added below just for the sake of posterity if someone hooks here for the similar queries:

With DataFrame.append() method:

df1.append(df2, ignore_index=True)

With pd.concat()

pd.concat([df1, df2], ignore_index=True)

OR create a Farme and then concat:

frames = [df1,df2]
pd.concat(frames, ignore_index=True)
Karn Kumar
  • 8,518
  • 3
  • 27
  • 53