I want to prepare a pd.DataFrame
with data relating with machine maintenance. the data is based on time series. I want to clean my targets (df['entry']
in the example below) to only keep the first 2 elements of each start of patterns. I have a POC with pd.shift
but it might miss some events (the last event in the example below). In the pd.DataFrame
, I have 4 patterns starting. Any idea how to create a feature to clean my dataset and only keep the first nth elements of patterns ?
What I have so far:
df = pd.DataFrame({'entry': [0,1,1,1,1,1,0,0,1,1,0,0,0,1,0,1,0],
'Expected':[0,1,1,0,0,0,0,0,1,1,0,0,0,1,0,1,0],
'comment': ['', 'keep', 'keep', 'drop', 'drop', 'drop', '', '', 'keep', 'keep', '', '', '', 'keep', '', 'How to get that one ?', '']})
df['shifted'] = df['entry'].shift(2).fillna(0)
def first(entry):
return entry['entry']==1 and entry['shifted']==0
df['calculated'] = df.apply(first, axis=1)
df
below is what I get from my script, see the line before the last is calculated wrong (start of pattern missed)
entry Expected comment shifted calculated
0 0.0 0.0 False
1 1 keep 0.0 True
1 1 keep 0.0 True
1 0 drop 1.0 False
1 0 drop 1.0 False
1 0 drop 1.0 False
0 0 1.0 False
0 0 1.0 False
1 1 keep 0.0 True
1 1 keep 0.0 True
0 0 1.0 False
0 0 1.0 False
0 0 0.0 False
1 1 keep 0.0 True
0 0 0.0 False
1 1 How to get that one ? 1.0 False
0 0 0.0 False
Comments are welcome.