0

I have a dataframe laid out like the following with site names and a range of dates for each site.

Site       Date
Site_1     02/09/2011
Site_1     03/09/2011
Site_1     04/09/2011
Site_1     05/09/2011
Site_2     14/01/2010
Site_2     15/01/2010
Site_2     16/01/2010
Site_2     17/01/2010

I'd like to find the start and end dates for each site and lay the dataframe out like this:

Site       Start_Date     End_Date
Site_1     02/09/2011     04/09/2011
Site_2     14/01/2010     17/01/2010

I know I can find the min and max of the full range like this:

df['Date'].max()
df['Date'].min()

Just not sure what the best method is to apply this separately to each site.

dinn_
  • 93
  • 1
  • 10
  • Does this answer your question? [Max and Min date in pandas groupby](https://stackoverflow.com/questions/25024797/max-and-min-date-in-pandas-groupby) – BigBen Dec 13 '21 at 15:02

2 Answers2

3

I would advise using a groupby on the "site" column and aggregating each group into a min and max date.

df.groupby("Site").agg({'date': ['min', 'max']})

This will return the min and max date for each site.

I haven't tried out the code, but it should do what you want.

Grinjero
  • 436
  • 2
  • 7
1

Using the advanced syntax of groupby+agg you can directly get your custom format:

df.groupby('Site')['Date'].agg([('Start_Date', 'min'), ('End_Date', 'max')])

output:

        Start_Date    End_Date
Site                          
Site_1  02/09/2011  05/09/2011
Site_2  14/01/2010  17/01/2010
mozway
  • 194,879
  • 13
  • 39
  • 75