0

i've seen some similar question but can't figure out how to handle my problem.

I have a dataset with evereyday total snow values from 1970 till 2015.

Now i want to find out when there was the first and the last day with snow.

I want to do this for every season. One season should be from, for example 01.06.2000 - 30.5.2001, this season is then Season 2000/2001.

I have already set my date column as index(format year-month-day, 2006-04-24)

When I select a specific range with

df_s = df["2006-04-04" : "2006-04-15"]

I am able to find out the first and last day with snow in this period with

firstsnow = df_c[df_c['Height'] > 0].head(1)

lastsnow = df_c[df_c['Height'] > 0].tail(1)

I want to do this now for the whole dataset, so that I'm able to compare each season and see how the time of first snow changed.

My dataframe looks like this(here you see a selected period with values),Height is Snowheight, Diff is the difference to the previous day. Height and Diff are Float64.

  
             Height    Diff
Date                      
2006-04-04   0.000     NaN
2006-04-05   0.000   0.000
2006-04-06   0.000   0.000
2006-04-07  16.000  16.000
2006-04-08   6.000 -10.000
2006-04-09   0.001  -5.999
2006-04-10   0.000  -0.001
2006-04-11   0.000   0.000
2006-04-12   0.000   0.000
2006-04-13   0.000   0.000
2006-04-14   0.000   0.000
2006-04-15   0.000   0.000
(12, 2)
<class 'pandas.core.frame.DataFrame'>

I think i have to work with the groupby function, but i don't know how to apply this function in this case.

  • Welcome to SO! A [minimal reproducible](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) would help answer. Glancing at your problem, I'd suggest adding a column for Season, calculating this from the relevant date ranges. Once you have this new column you can groupby Season and find first / last snow dates. – ivanp Oct 28 '22 at 11:08
  • Thank you, i'm working with seasons now that start in july every year and it works! i did it with this: df_c['Timestamp'] = pd.to_datetime(df_c.Timestamp) df_c['Season'] = df_c['Timestamp'].dt.to_period('Q-Jun').dt.qyear.apply(lambda x: str(x-1) + "-" + str(x)) – tryingnottodiewhilecoding Oct 31 '22 at 13:02

1 Answers1

0

You can use the trick to create new column with only positive value, and None otherwise. Then use ffill and bfill to get the head and tail

Sample data:

df = pd.DataFrame({'name': ['a1','a2','a3','a4','a5','b1','b2','b3','b4','b5'],
                   'gr':[1]*5+[2]*5,
                   'val1':[None,-1,2,1,None,-1,4,7,3,-2]})

Input:

  name  gr  val1
0   a1   1   NaN
1   a2   1  -1.0
2   a3   1   2.0
3   a4   1   1.0
4   a5   1   NaN
5   b1   2  -1.0
6   b2   2   4.0
7   b3   2   7.0
8   b4   2   3.0
9   b5   2  -2.0

Set positive then ffill and bfill:

df['positive'] = np.where(df['val1']>0, df['val1'], None)
df['positive'] = df.groupby('gr')['positive'].apply(lambda g: g.ffill())
df['positive'] = df.groupby('gr')['positive'].apply(lambda g: g.bfill())

Check result:

df.groupby('gr').head(1)
df.groupby('gr').tail(1)

  name  gr  val1  positive
0   a1   1   NaN       2.0
5   b1   2  -1.0       4.0

  name  gr  val1  positive
4   a5   1   NaN       1.0
9   b5   2  -2.0       3.0
PTQuoc
  • 938
  • 4
  • 13