I have tried to match two data frames on three columns, namely "ticker", "year", and "quarter". While the first dataset contains two observations (i.e. two rows) for each ticker, year, quarter, the second only contains one row... especially, the first dataframe always captures two different speakers in the same quarter while the second dataframe only captures firm information. Now, I want to match the dataframe so that each of the two rows of the first data frame within the same firm quarter are merged with the one relevant row in the second dataframe.
My data looks as follows: df
ticker year quarter exec_lname jobposition
XX 2009 3 A CEO
XX 2009 3 B CFO
XX 2009 4 A CEO
XX 2009 4 B CFO
YY 2007 1 C CEO
YY 2007 1 D CFO
YY 2007 2 C CEO
YY 2007 2 D CFO
ZZ 2008 3 F CEO
ZZ 2008 3 G CFO
dfnew
ticker year quarter eps calldate
XX 2009 3 x Mar
XX 2009 4 y Apr
YY 2007 1 z Feb
YY 2007 2 a Jan
ZZ 2008 3 b Dec
At the end, it should look like this:
ticker year quarter exec_lname jobposition eps calldate
XX 2009 3 A CEO x Mar
XX 2009 3 B CFO x Mar
XX 2009 4 A CEO y Apr
XX 2009 4 B CFO y Apr
YY 2007 1 C CEO z Feb
YY 2007 1 D CFO z Feb
YY 2007 2 C CEO a Jan
YY 2007 2 D CFO a Jan
ZZ 2008 3 F CEO b Dec
ZZ 2008 3 G CFO b Dec
I tried:
dfjoin = pd.merge(dfnew, df, how='left', left_on=['ticker', "year", "quarter"], right_on = ['ticker', "year", "quarter"])
but it returns the new dataset with all the right rows and columns, yet with the columns eps and calldate completely filled with NaN. Might this be because I want to merge each row twice to df? The problem is not merging on more than one key - the problem might be that in the first dataframe I always have two rows with the same combination of ticker/year/quarter.
I hope someone can help me! Thank you! Julia