I'm struggeling with some data massaging process. I'm thinking of a best way to find and remove non-changing entries in dataframe.
I have signal level data that comes from mobile device
measDate measLatitude measLongitude measCellId measNetTypeDetail measOperatorCode measSignalLevel
2019-06-05 00:22:10.791 27.676038 84.177025 14603 13 42902 -97
2019-06-05 00:22:11.806 27.676038 84.177025 14603 13 42902 -97
2019-06-05 00:22:14.179 27.676038 84.177025 14604 13 42902 -97
2019-06-05 00:22:14.279 27.676038 84.177025 14604 13 42902 -97
2019-06-05 00:22:16.657 27.676038 84.177025 14604 13 42902 -97
2019-06-05 00:22:18.904 27.676038 84.177025 14615 13 42902 -96
2019-06-05 00:22:21.276 27.676038 84.177025 14615 13 42902 -96
2019-06-05 00:22:23.557 27.676038 84.177025 14614 13 42902 -95
2019-06-05 00:22:24.796 27.676038 84.177025 14603 10 42902 -96
2019-06-05 00:22:26.768 27.676038 84.177025 14603 10 42902 -96
2019-06-05 00:22:27.787 27.676038 84.177025 14603 10 42902 -96
2019-06-05 00:22:28.802 27.676038 84.177025 14603 10 42902 -96
2019-06-05 00:22:31.803 27.676038 84.177025 14603 10 42902 -96
2019-06-05 00:22:33.799 27.676038 84.177025 14603 10 42902 -96
so basically signalLevel and some associated data + timestamp
What I need to do is to filter data in following way:
- if signal level does not change for the past n-samples (eg. n=3 samples)
- if other values does not change as well
- filter-out the data.
So in the end I'd like to have maximum n-same-samples in row.
Expected results with WINDOW N=3
measDate measLatitude measLongitude measCellId measNetTypeDetail measOperatorCode measSignalLevel
2019-06-05 00:22:10.791 27.676038 84.177025 14603 13 42902 -97
2019-06-05 00:22:11.806 27.676038 84.177025 14603 13 42902 -97
2019-06-05 00:22:14.179 27.676038 84.177025 14604 13 42902 -97
2019-06-05 00:22:18.904 27.676038 84.177025 14615 13 42902 -96
2019-06-05 00:22:21.276 27.676038 84.177025 14615 13 42902 -96
2019-06-05 00:22:23.557 27.676038 84.177025 14614 13 42902 -95
2019-06-05 00:22:24.796 27.676038 84.177025 14603 10 42902 -96
2019-06-05 00:22:26.768 27.676038 84.177025 14603 10 42902 -96
2019-06-05 00:22:27.787 27.676038 84.177025 14603 10 42902 -96
I've came to this code after some tests. I try to check if any of the columns value changed by checking unique values within window per column (here window size =3) then I sum if any of the column changed then I need to keep it
RadioSmall = RadioMeasAll.loc[:,['measLatitude','measLongitude','measCellId','measNetTypeDetail','measOperatorCode','measCid','measLac','measSignalLevel','cellArfcn']].copy()
def f(x):
y = x[~np.isnan(x)]
if len(np.unique(y)) > 1:
return 1
else:
return 0
a = RadioSmall.rolling(window=3, min_periods=1).apply(f,raw=True)
a['sum']=a.sum(axis=1)
b = pd.DataFrame(index=a.index)
b['filtering'] = a['sum']
df_filtered = b.query('filtering>0')
RadioMeasAll.join(df_filtered)
Looks very ugly to me.
Thanks for help!