0

I have the following dataframe:

df = pd.DataFrame({'Date': ['2020-01-01', '2020-10-01', '2020-01-05', '2021-01-01', '2021-10-01', '2021-01-15', '2021-08-11'],
                   'ID': [101, 101, 101, 102, 102, 101, 101],
                   'ID2': [20, 15, 20, 11, 11, 15, 15]})

#looks like this 
    Date        ID  ID2
0   2020-01-01  101 20
1   2020-10-01  101 15
2   2020-01-05  101 20
3   2021-01-01  102 11
4   2021-10-01  102 11
5   2021-01-15  101 15
6   2021-08-11  101 15

I would like to group data first by ID than ID2 and find the min and max date and put them in separate columns having the following ouput:

    Start_Date End_Date     ID  ID2
0   2020-01-01 2020-01-05   101 20
1   2020-10-01 2021-08-11   101 15
2   2021-01-01 2021-10-01   102 11

Is this possible?

Rocco
  • 141
  • 6

1 Answers1

1

First convert values to datetimes, aggregate by GroupBy.agg and last change columns ordering with indexing:

df['Date'] = pd.to_datetime(df['Date'])

df = (df.groupby(['ID','ID2'], sort=False, as_index=False)
        .agg(Start_Date=('Date','min'), End_Date=('Date','max')))
print (df)
    ID  ID2 Start_Date   End_Date
0  101   20 2020-01-01 2020-01-05
1  101   15 2020-10-01 2021-08-11
2  102   11 2021-01-01 2021-10-01

df = df[df.columns[-2:].tolist() + df.columns[:2].tolist()]
print (df)
  Start_Date   End_Date   ID  ID2
0 2020-01-01 2020-01-05  101   20
1 2020-10-01 2021-08-11  101   15
2 2021-01-01 2021-10-01  102   11
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • Thank you, to remove the agregated indexes I added an argument. df.groupby(['ID','ID2'],as_index=False).agg(Start_Date=('Date','max'), End_Date =('Date','min')) – Rocco Mar 18 '22 at 09:31