2

I have two dataframes. One dataframe contains financial figures of different companies and the other one contains the corresponding industry codes. I would now like to add the corresponding industry codes to the companies. The problem is that the financial figures are contained for several years and therefore the companies are contained in one dataframe more often than in the other. Both dataframes don't have the same length and therefore I cannot simply merge them.

Of course this code is not correct, but hopefully it helps making my point clear.

dataframe1['Industry'] = dataframe2['Industry'] if dataframe1['name'] = dataframe2['name']

Dataframe1

The column names are: Year Cusip Name Current Assets Trade Liabilities

    2010.0  825690100   SHUTTERSTOCK INC    90480.0     4680.0
    2011.0  825690100   SHUTTERSTOCK INC    180740.0    18380.0

Dataframe2

column names: Cusip IndustryCode

521893107   3714
605288208   2873
549463107   3661
783755101   3462
74377P203   4833

As I said, I just want to add the industry codes to the dataframe1 so I have the financial ratios and Industry Codes for all companies and for every year.

BR

2 Answers2

0

I have change one value Cusip for this example:

df1 = pd.DataFrame({
    'Year':[2010,2011],
    'Cusip':[825690100,825690100],
    'Name':['SHUTTERSTOCK INC','SHUTTERSTOCK INC'],
    'Current_Assets':[90480.0,180740.0],
    'Trade Liabilities':[4680,18380]
})
df2 = pd.DataFrame({
    'Cusip':[825690100,605288208,549463107,783755101,'74377P203'],
    'IndustryCode':[3714,2873,3661,3462,4833]
})
df= df1.merge(df2,on='Cusip')
print(df)
   Year      Cusip              Name  Current_Assets  Trade Liabilities  IndustryCode
0  2010  825690100  SHUTTERSTOCK INC         90480.0               4680          3714
1  2011  825690100  SHUTTERSTOCK INC        180740.0              18380          3714
sygneto
  • 1,761
  • 1
  • 13
  • 26
0

The answer by sygneto is great. I just want to add a situation where you have duplicates in the second data, and it will affect the output (I added some duplicates):

df1 = pd.DataFrame({
  'Year':[2010,2011],
  'Cusip':[825690100,825690100],
  'Name':['SHUTTERSTOCK INC','SHUTTERSTOCK INC'],
  'Current_Assets':[90480.0,180740.0],
  'Trade Liabilities':[4680,18380]
 })
df2 = pd.DataFrame({
  'Cusip':[825690100,825690100,605288208,549463107,783755101,'74377P203'],
  'IndustryCode':[3714,3714,2873,3661,3462,4833]
})

# to avoid effect of possible duplicates in the outcome
df2 = df2.drop_duplicates(subset='Cusip', keep="last")

df = pd.merge(df1,df2,on='Cusip', how = 'left') 
Sam S.
  • 627
  • 1
  • 7
  • 23