3

Trying to use pandas rolling with a timeindex and the count() method, getting an error, what am I missing here?

here is an example:

d = {'vv': {pd.Timestamp('2020-01-13 08:22:00', freq='T'): 'aa',
pd.Timestamp('2020-01-13 08:23:00', freq='T'): 'bb',
pd.Timestamp('2020-01-13 08:24:00', freq='T'): 'cc',
pd.Timestamp('2020-01-13 08:25:00', freq='T'): np.nan,
pd.Timestamp('2020-01-13 08:26:00', freq='T'): 'dd'}}

df = pd.DataFrame(d)

df['vv'].rolling('72s').count()

getting this:

ValueError                                Traceback (most recent call last)
~\Anaconda3\lib\site-packages\pandas\core\window\rolling.py in _prep_values(self, values)
    354             try:
--> 355                 values = ensure_float64(values)
    356             except (ValueError, TypeError) as err:

pandas\_libs\algos_common_helper.pxi in pandas._libs.algos.ensure_float64()

ValueError: could not convert string to float: 'aa'

The above exception was the direct cause of the following exception:

TypeError                                 Traceback (most recent call last)
~\Anaconda3\lib\site-packages\pandas\core\window\rolling.py in _apply(self, func, center, require_min_periods, floor, is_weighted, name, use_numba_cache, **kwargs)
    535             try:
--> 536                 values = self._prep_values(b.values)
    537 

~\Anaconda3\lib\site-packages\pandas\core\window\rolling.py in _prep_values(self, values)
    356             except (ValueError, TypeError) as err:
--> 357                 raise TypeError(f"cannot handle this type -> {values.dtype}") from err
    358 

TypeError: cannot handle this type -> object

The above exception was the direct cause of the following exception:

DataError                                 Traceback (most recent call last)
<ipython-input-166-8676c880bf7e> in <module>
      7 df = pd.DataFrame(d)
      8 
----> 9 df['vv'].rolling('72s').count()

~\Anaconda3\lib\site-packages\pandas\core\window\rolling.py in count(self)
   2048         if self.is_freq_type or isinstance(self.window, BaseIndexer):
   2049             window_func = self._get_roll_func("roll_count")
-> 2050             return self._apply(window_func, center=self.center, name="count")
   2051 
   2052         return super().count()

~\Anaconda3\lib\site-packages\pandas\core\window\rolling.py in _apply(self, func, center, require_min_periods, floor, is_weighted, name, use_numba_cache, **kwargs)
    542                     continue
    543                 else:
--> 544                     raise DataError("No numeric types to aggregate") from err
    545 
    546             if values.size == 0:

DataError: No numeric types to aggregate
Ezer K
  • 3,637
  • 3
  • 18
  • 34

1 Answers1

2

Pandas rolling time window fails on count of string - why?

I think it is bug. It working with numeric value of window well:

s = df['vv'].rolling(4).count()
print (s)
2020-01-13 08:22:00    1.0
2020-01-13 08:23:00    2.0
2020-01-13 08:24:00    3.0
2020-01-13 08:25:00    3.0
2020-01-13 08:26:00    3.0
Name: vv, dtype: float64

One possible idea is use count with replace non missing values by Series.isna to 1:

d = {'vv': {pd.Timestamp('2020-01-13 08:22:00', freq='T'): 'aa',
pd.Timestamp('2020-01-13 08:23:00', freq='T'): 'bb',
pd.Timestamp('2020-01-13 08:24:00', freq='T'): 'cc',
pd.Timestamp('2020-01-13 08:25:00', freq='T'): np.nan,
pd.Timestamp('2020-01-13 08:26:00', freq='T'): 'dd'}}

df = pd.DataFrame(d)

df = df['vv'].where(df['vv'].isna(), 1).rolling('72s').count()
print (df)
2020-01-13 08:22:00    1.0
2020-01-13 08:23:00    2.0
2020-01-13 08:24:00    2.0
2020-01-13 08:25:00    1.0
2020-01-13 08:26:00    1.0
Name: vv, dtype: float64

Details:

print (df['vv'].where(df['vv'].isna(), 1))
2020-01-13 08:22:00      1
2020-01-13 08:23:00      1
2020-01-13 08:24:00      1
2020-01-13 08:25:00    NaN
2020-01-13 08:26:00      1
Name: vv, dtype: object

Fist idea with test not missing values and sum:

df = df['vv'].notna().rolling('72s').sum()
print (df)
2020-01-13 08:22:00    1.0
2020-01-13 08:23:00    2.0
2020-01-13 08:24:00    2.0
2020-01-13 08:25:00    1.0
2020-01-13 08:26:00    1.0
Name: vv, dtype: float64
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • hmm, makes sense, but for some reason the numbers don't seem correct, shouldn't, for example, 8:24 be 2 ('aa' and 'bb' occured within 72 secs) ? – Ezer K Aug 16 '20 at 14:31
  • Thanks, ".notna().rolling('72s').sum()" seems to work as well (found this following your first version) – Ezer K Aug 16 '20 at 14:38
  • @EzerK - oops, then I was wrong, testing miisng values. Add to answer. – jezrael Aug 16 '20 at 14:39