0

I'm once again stuck on one of my Python assignments, this one requiring me to merge four different data frames together. While the first three are not difficult and just require me to concatenate them, it's the fourth one that has me stumped.

While the first three involved stock data of three different stocks from 2017-2019, the fourth data frame combines all three in a much longer column, that includes years from 2014-2019. I find myself unable to combine the four together in a way that makes sense. I will post what I'm trying to accomplish, as well as the data frames in question. Does anybody have any idea of how to approach this?

The frame I'm having trouble with is the "data4" frame. I will attach code of what I'm trying to do.

import pandas as pd

data1 = {'tic':['PYPL','PYPL','PYPL'],
        'fyear':['2017','2018','2019'],
        'asset':[40774,43332,51333],
        'debt':[0,0,5368],
        'ni':[1795,2057,2459],
       }
data2 = {'tic':['SQ','SQ','SQ'],
        'fyear':['2017','2018','2019'],
        'asset':[2187.27,3281.023,4551.258],
        'debt':[362.404,1051.141,1047.662],
        'ni':[-62.813,-38.453,375.446],
       }
data3 = {'tic':['GPN','GPN','GPN'],
        'fyear':['2017','2018','2019'],
        'asset':[12998.069,13230.774,44480.162],
        'debt':[4559.408,5015.168,9487.852],
        'ni':[468.425,452.053,430.613],
       }
data4 = {'fyear':['2015','2016','2017','2018','2019','2015','2016','2017','2018','2019','2015','2016'],
         'tic':['PYPL','PYPL','PYPL','PYPL','PYPL','SQ','SQ','SQ','SQ','SQ','GPN','GPN'],
         'price':[36.2,39.47,73.62,84.09,108.17,13.09,13.63,34.67,56.09,62.56,104.38,77.69],
    
}
PYPL = pd.DataFrame(data1)
SG = pd.DataFrame (data2)
gpn = pd.DataFrame (data3)
prc = pd.DataFrame (data4)

frame = pd.concat([PYPL,SG,gpn])

pd.concat([frame,prc])

This does not have the intended effect. I'm trying to get something akin to this.

tic fyear asset debt ni price
0 PYPL 2017 40774.000 0.000 1795.000 73.62
1 PYPL 2018 43332.000 0.000 2057.000 84.09
2 PYPL 2019 51333.000 5368.000 2459.000 108.17
3 SQ 2017 2187.270 362.404 -62.813 34.67
4 SQ 2018 3281.023 1051.140 -38.453 56.09
5 SQ 2019 4551.258 1047.662 375.446 62.56
6 GPN 2017 12998.069 4559.408 468.425 100.24
7 GPN 2018 13230.774 5015.168 452.053 103.13
8 GPN 2019 44480.162 9487.852 430.613 182.56

2 Answers2

0

Try this using merge, by default, merge will "join" on columns with the same name in both dataframes and does an inner join:

frame.merge(prc)

Output:

    tic fyear      asset      debt        ni   price
0  PYPL  2017  40774.000     0.000  1795.000   73.62
1  PYPL  2018  43332.000     0.000  2057.000   84.09
2  PYPL  2019  51333.000  5368.000  2459.000  108.17
3    SQ  2017   2187.270   362.404   -62.813   34.67
4    SQ  2018   3281.023  1051.141   -38.453   56.09
5    SQ  2019   4551.258  1047.662   375.446   62.56
Scott Boston
  • 147,308
  • 15
  • 139
  • 187
0
...
#taking into account, year and tic
result=frame.merge(prc,on=['fyear','tic'])
print(result)
    tic fyear      asset      debt        ni   price
0  PYPL  2017  40774.000     0.000  1795.000   73.62
1  PYPL  2018  43332.000     0.000  2057.000   84.09
2  PYPL  2019  51333.000  5368.000  2459.000  108.17
3    SQ  2017   2187.270   362.404   -62.813   34.67
4    SQ  2018   3281.023  1051.141   -38.453   56.09
5    SQ  2019   4551.258  1047.662   375.446   62.56
sygneto
  • 1,761
  • 1
  • 13
  • 26