0

ur has 2 columns and 20 rows and looks like this:

DATE         UNRATE       
1997-12-31  4.941667
1998-12-31  4.500000
1999-12-31  4.216667

atddf also has 2 cols and 20 rows and looks like this:

              attendance
yearID                  
1997-01-01  2.256025e+06
1998-01-01  2.353372e+06
1999-01-01  2.337979e+06

I want to find the correlation between the attendance and UNRATE columns but believe I need to join or merge the dataframes first. How do I join/merge on the yearID/DATE columns so the years match up?

I have tried these techniques and get the following errors:

join = ur.merge(atd_df, on="DATE")
KeyError: 'DATE'

join = ur.merge(atd_df, on="yearID")
KeyError: 'yearID'

res = pd.merge(urdf.assign(grouper=urdf['DATE'].dt.to_period('Y')),
               atddf.assign(grouper=atddf['yearID'].dt.to_period('Y')),
               how='left', on='grouper')
KeyError: 'DATE'
ChrisC
  • 1
  • you need to use the param `left_on` and `right_on` with merge in this case. Here are the [pandas merge docs](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.merge.html); `ur.merge(atd_df, left_on="DATE", right_on="yearID")`. However, it does look like `yearID` is an index so `ur.merge(atd_df, left_on="DATE", right_index=True)` – It_is_Chris Mar 30 '20 at 13:29
  • `newdf = urdf.merge(atddf, left_on="DATE",right_on="yearID",right_index=True) newdf` brings me an empty data set: `UNRATE attendance DATE` and so does `newdf = urdf.merge(atddf, left_on="DATE", right_index=True) newdf` – ChrisC Mar 30 '20 at 14:02

0 Answers0