1

I have a dataframe looks like this ,

df = pd.DataFrame({'col1':range(9), 'col2': list(range(6)) + [np.nan] *3}, 
    index = pd.date_range('1/1/2000', periods=9, freq='T'))

df
Out[63]: 
                     col1  col2
2000-01-01 00:00:00     0   0.0
2000-01-01 00:01:00     1   1.0
2000-01-01 00:02:00     2   2.0
2000-01-01 00:03:00     3   3.0
2000-01-01 00:04:00     4   4.0
2000-01-01 00:05:00     5   5.0
2000-01-01 00:06:00     6   NaN
2000-01-01 00:07:00     7   NaN
2000-01-01 00:08:00     8   NaN

and when I perform resample by method last,

df.resample('3T', label='right', closed='right').last()
Out[60]: 
                     col1  col2
2000-01-01 00:00:00     0   0.0
2000-01-01 00:03:00     3   3.0
2000-01-01 00:06:00     6   5.0
2000-01-01 00:09:00     8   NaN

As can be seen above, the 6th minute row has data on col1, so after resample, col1 is filled with data on 6th minute row, but col2 is filled with 5th minute row, is there is a way to make sure both data after resample come from 6th minute row, that means if col1 has data, the resample will not fill col2's NaN with last, but simply leave it as is?

Out[60]: 
                     col1  col2
2000-01-01 00:00:00     0   0.0
2000-01-01 00:03:00     3   3.0
2000-01-01 00:06:00     6   NaN  <--- if there at least one col has data,the whole row will be used in resample
2000-01-01 00:09:00     8   NaN
tesla1060
  • 2,621
  • 6
  • 31
  • 43
  • Relevant: https://stackoverflow.com/questions/55583246/what-is-different-between-groupby-first-groupby-nth-groupby-head-when-as-index/55583395#55583395 – ALollz Apr 29 '19 at 03:23

2 Answers2

4

That is how last work in pandas , it will return the last notnull value for group , if you want to get the last value (included nan, check with iloc with apply )

df.resample('3T', label='right', closed='right').apply(lambda x : x.iloc[-1])
Out[103]: 
                     col1  col2
2000-01-01 00:00:00     0   0.0
2000-01-01 00:03:00     3   3.0
2000-01-01 00:06:00     6   NaN
2000-01-01 00:09:00     8   NaN
BENY
  • 317,841
  • 20
  • 164
  • 234
0

Also possible with .nth(-1) or .tail(1) using ceil to form the same groups:

df.groupby(df.index.ceil('3T')).nth(-1)
                     col1  col2
2000-01-01 00:00:00     0   0.0
2000-01-01 00:03:00     3   3.0
2000-01-01 00:06:00     6   NaN
2000-01-01 00:09:00     8   NaN
ALollz
  • 57,915
  • 7
  • 66
  • 89