0

I am new to Python and I am trying to merge two datasets for my research together:

  1. df1 has the column names: companyname, ticker, and Dscode,
  2. df2 has companyname, ticker, grouptcode, and Dscode.

I want to merge the grouptcode from df1 to df2, however, the companyname is slightly different, but very similar between the two dataframes.

For each ticker, there is an associated Dscode. However, multiple companies have the same ticker, and therefore the same Dscode.

Problem

I am only interested in merging the grouptcode for the associated ticker and Dscode that matches the companyname (which at times is slightly different - this part is what I cannot get past). The code I have been using is below.

Code

import pandas as pd
import os

# set working directory
path = "/Users/name/Desktop/Python"
os.chdir(path)
os.getcwd() # Prints the working directory

# read in excel file         
file = "/Users/name/Desktop/Python/Excel/DSROE.xlsx"
x1 = pd.ExcelFile(file)
print(x1.sheet_names)
df1 = x1.parse('Sheet1')
df1.head()
df1.tail()

file2 = "/Users/name/Desktop/Python/Excel/tcode2.xlsx"
x2 = pd.ExcelFile(file2)
print(x2.sheet_names)
df2 = x2.parse('Sheet1')
df2['companyname'] = df2['companyname'].str.upper() ## make column uppercase
df2.head()
df2.tail()

df2 = df2.dropna()            
x3 = pd.merge(df1, df2,how = 'outer') # merge   

Data

df1

       Dscode ticker                              companyname
65286  8933TC    3pl                      3P LEARNING LIMITED
79291  9401FP    a2m                  A2 MILK COMPANY LIMITED
1925   14424Q    aac  AUSTRALIAN AGRICULTURAL COMPANY LIMITED
39902  675493    aad                     ARDENT LEISURE GROUP
1400   133915    aba                     AUSWIDE BANK LIMITED
74565  922472    abc                ADELAIDE BRIGHTON LIMITED
7350   26502C    abp                    ABACUS PROPERTY GROUP
39202  675142    ada              ADACEL TECHNOLOGIES LIMITED
80866  9661AD    adh                                   ADAIRS
80341  9522QV    afg         AUSTRALIAN FINANCE GROUP LIMITED
45327  691938    agg                ANGLOGOLD ASHANTI LIMITED
2625   14880E    agi        AINSWORTH GAME TECHNOLOGY LIMITED
75090  923040    agl                       AGL ENERGY LIMITED
19251  29897X    ago                       ATLAS IRON LIMITED
64409  890588    agy                  ARGOSY MINERALS LIMITED
24151  31511D    ahg        AUTOMOTIVE HOLDINGS GROUP LIMITED
64934  8917JD    ahy                      ASALEO CARE LIMITED
42877  691152    aia   AUCKLAND INTERNATIONAL AIRPORT LIMITED
61433  88013C    ajd           ASIA PACIFIC DATA CENTRE GROUP
44452  691704    ajl                   AJ LUCAS GROUP LIMITED
700    13288C    ajm                    ALTURA MINING LIMITED
19601  29929D    akp            AUDIO PIXELS HOLDINGS LIMITED
79816  951404    alk                 ALKANE RESOURCES LIMITED
56008  865613    all               ARISTOCRAT LEISURE LIMITED
51807  771351    alq                              ALS LIMITED
44277  691685    alu                           ALTIUM LIMITED
42702  68625C    alx                      ATLAS ARTERIA GROUP
30101  41162F    ama                        AMA GROUP LIMITED
67386  902201    amc                            AMCOR LIMITED
33426  50431L    ami                   AURELIA METALS LIMITED

df2

        companyname                        grouptcode   ticker  
524  3P LEARNING LIMITED..                      tpn1    3pl   
1    THE A2 MILK COMPANY LIMITED                a2m1    a2m   
2    AUSTRALIAN AGRICULTURAL COMPANY LIMITED.   aac2    aac   
3    AAPC LIMITED.                              aad1    aad   
6    ADVANCE BANK AUSTRALIA LIMITED             aba1    aba   
7    ADELAIDE BRIGHTON CEMENT HOLDINGS LIMITED  abc1    abc   
8    ABACUS PROPERTY GROUP                      abp1    abp   
9    ADACEL TECHNOLOGIES LIMITED                ada1    ada   
288  ADA CORPORATION LIMITED                    khs1    ada   
10   AERODATA HOLDINGS LIMITED                  adh1    adh   
11   ADAMS (HERBERT) HOLDINGS LIMITED           adh2    adh   
12   ADAIRS LIMITED                             adh3    adh   
431  ALLCO FINANCE GROUP LIMITED                rcd1    afg   
13   AUSTRALIAN FINANCE GROUP LTD               afg1    afg   
14   ANGLOGOLD ASHANTI LIMITED                  agg1    agg   
15   APGAR INDUSTRIES LIMITED                   agi1    agi   
16   AINSWORTH GAME TECHNOLOGY LIMITED          agi2    agi   
17   AUSTRALIAN GAS LIGHT COMPANY (THE)         agl1    agl   
18   ATLAS IRON LIMITED                         ago1    ago   
393  ACM GOLD LIMITED                           pgo2    ago   
19   AUSTRALIAN GYPSUM INDUSTRIES LIMITED       agy1    agy   
142  ARGOSY MINERALS INC                        cio1    agy   
21   ARCHAEAN GOLD NL                           ahg1    ahg   
22   AUSTRALIAN HYDROCARBONS N.L.               ahy1    ahy   
23   ASALEO CARE LIMITED                        ahy2    ahy   
24   AUCKLAND INTERNATIONAL AIRPORT LIMITED     aia1    aia   
25   ASIA PACIFIC DATA CENTRE GROUP             ajd1    ajd   
26   AJ LUCAS GROUP LIMITED                     ajl1    ajl   
27   AJAX MCPHERSON'S LIMITED                   ajm1    ajm   
29   ALKANE EXPLORATION (TERRIGAL) N.L.         alk1    alk  

     Dscode  
524  8933TC  
1    9401FP  
2    14424Q  
3    675493  
6    133915  
7    922472  
8    26502C  
9    675142  
288  675142  
10   9661AD  
11   9661AD  
12   9661AD  
431  9522QV  
13   9522QV  
14   691938  
15   14880E  
16   14880E  
17   923040  
18   29897X  
393  29897X  
19   890588  
142  890588  
21   31511D  
22   8917JD  
23   8917JD  
24   691152  
25   88013C  
26   691704  
27   13288C  
29   951404
oceanbeach96
  • 604
  • 9
  • 19
  • Please provide a minimum working example and please don't post images of data. Instead give us a small excerpt of the dataframes as code. – JE_Muc Jul 10 '18 at 14:17
  • How do you add excerpt of dataframe as code ? – oceanbeach96 Jul 10 '18 at 14:18
  • Go see this answer: https://stackoverflow.com/a/32536193/7480990 Short answer: use .to_dict() – tobsecret Jul 10 '18 at 14:21
  • I think this is actually an extremely complicated problem, if you in fact need to merge based on a fuzzy match between strings. There is some advice here: https://stackoverflow.com/questions/13636848/is-it-possible-to-do-fuzzy-match-merge-with-python-pandas – ALollz Jul 10 '18 at 14:26
  • I have edited the data in the appropriate format :) – oceanbeach96 Jul 10 '18 at 15:24
  • Why do you need to merge as both already have *ticker* and *Dscode* fields? And why are the Stata .dta files relevant for this question? – Parfait Jul 10 '18 at 17:07
  • dta files not relevant here. Sorry. Should not be here. I need the grouptcode in df1, but there are multiple companies with the same ticker and Dscode in df2. However, I only want the grouptcode to merge over, if the ticker, companyname and dscode match. – oceanbeach96 Jul 10 '18 at 17:55

0 Answers0