1

I have 2 datasets:

df1:

teamid  points  startdate  enddate
1       30      2017-07-01 2018-06-30
2       41      2016-07-01 2017-06-30
3       32      2016-07-01 2017-06-30

df2:

teamid  date        color
1       2017-01-02  red
1       2018-01-02  yellow
2       2017-06-05  blue
3       2014-01-05  red
4       2016-03-02  brown

I want to filter df2 on matching columns df1.teamid == df2.teamid and df2.date between df1.startdate and df1.enddate.

I tried many variations of:

df2_filtered = df2[(df2['teamid'].isin(df1['teamid'])) & (df2['date'] >= df1['startdate']) & (df2['date'] <= df1['enddate'])]

Which gives me a ValueError: Can only compare identically-labeled Series objects.

I also tried

df2_filtered = df2[(df2['teamid'].isin(df1['teamid'])) & (str(df2['date']) >= df1['startdate']) & (str(df2['date']) <= df1['enddate'])]

This results in 0 rows. Based on df1 and df2 there are matching rows that should pop up (rows 2 and 3 from df2).

How should I set the filter and why is the last option not working?

zef
  • 649
  • 1
  • 7
  • 22

1 Answers1

3

IIUC

ndf = pd.merge(df,df2, on='teamid', how='outer')
ndf.loc[ndf.date.between(ndf.startdate, ndf.enddate)]

    teamid  points  startdate   enddate date    color
1   1   30.0    2017-07-01  2018-06-30  2018-01-02  yellow
2   2   41.0    2016-07-01  2017-06-30  2017-06-05  blue
rafaelc
  • 57,686
  • 15
  • 58
  • 82