group values containing np.nan in intervals

Question

I have a pandas series containing zeros, ones and np.nan:

import pandas as pd
import numpy as np
df1 = pd.Series([ 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, np.nan, np.nan, 1])
df1
Out[6]: 
0     0.0
1     0.0
2     0.0
3     0.0
4     0.0
5     1.0
6     1.0
7     1.0
8     0.0
9     0.0
10    0.0
11    NaN
12    NaN
13    1.0
dtype: float64

I would like to create a dataframe df2 that contains the start and the end of intervals with the same value, together with the value associated... df2 in this case should be...

df2
Out[5]: 
   Start     End  Value
0      0  4         0
1      5  7         1
2      8  10        0
3      11 12        NaN
4      13 13        1

Following a solution here:

s = df1.ne(df1.shift()).cumsum()
df2 = df1.groupby(s).apply(lambda x: pd.Series([x.index[0], x.index[-1], x.iat[0]], 
                                                index=['Start','End','Value']))
                   .unstack().reset_index(drop=True)

but it does not work for this case

df2
Out[11]: 
   Start   End  Value
0    0.0   4.0    0.0
1    5.0   7.0    1.0
2    8.0  10.0    0.0
3   11.0  11.0    NaN
4   12.0  12.0    NaN
5   13.0  13.0    1.0

If you want to know the reason why NaN values never compare equal, it's explained in this question. This is not python or numpy specific. https://stackoverflow.com/questions/1565164/what-is-the-rationale-for-all-comparisons-returning-false-for-ieee754-nan-values — Håken Lid, Jun 29 '17 at 11:39

score 2 · Accepted Answer · answered Jun 29 '17 at 10:34

NaNs have issue with equality check. You could work around, with filling it temporarily with an unassuming value.

In [361]: s = df1.fillna('-dummy-').ne(df1.fillna('-dummy-').shift()).cumsum()

In [362]: df1.groupby(s).apply(lambda x: pd.Series([x.index[0], x.index[-1], x.iat[0]],
     ...:                                           index=['Start','End','Value']))
     ...:          .unstack().reset_index(drop=True)
Out[362]:
   Start   End  Value
0    0.0   4.0    0.0
1    5.0   7.0    1.0
2    8.0  10.0    0.0
3   11.0  12.0    NaN
4   13.0  13.0    1.0

group values containing np.nan in intervals

1 Answers1

Linked