How to apply a function not returning a numeric value to a pandas rolling Window?

Question

I have a datetime series of dtype: float64. I am trying to apply a custom function to a rolling window on the series. I want this function to return strings. However, this generates a TypeError. Why does this generate the error and is there a way to make this work directly with the application of one function?

Here is an example:

import numpy as np
import pandas as pd

np.random.seed(1)
number_series = pd.Series(np.random.randint(low=1,high=100,size=100),index=[pd.date_range(start='2000-01-01',freq='W',periods=100)])
number_series = number_series.apply(lambda x: float(x))

def func(s):
    
    if s[-1] > s[-2] > s[-3]:
        return 'High'
    elif s[-1] > s[-2]:
        return 'Medium'
    else:
        return 'Low'

new_series = number_series.rolling(5).apply(func)

The result is the following error:

TypeError: must be real number, not str

The workaround that I have in place at the moment is to amend the func to output integers to a series and then to apply another function to this series to generate the new series. As per the example below:

def func_float(s):
    
    if s[-1] > s[-2] > s[-3]:
        return 1
    elif s[-1] > s[-2]:
        return 2
    else:
        return 3
    
float_series = number_series.rolling(5).apply(func_float)

def func_text(s):

    if s == 1:
        return 'High'
    elif s == 2:
        return 'Medium'
    else:
        return 'Low'
    
new_series = float_series.apply(func_text)

This gives the expected result from the initial code that generated the error:

new_series

2000-01-02       Low
2000-01-09       Low
2000-01-16       Low
2000-01-23       Low
2000-01-30    Medium
               ...  
2001-10-28       Low
2001-11-04    Medium
2001-11-11      High
2001-11-18      High
2001-11-25       Low
Length: 100, dtype: object

I think your issue stems from the fact that a numpy series must always contain the same type of data, so when you attempt to convert the first float to a string you get the error — itprorh66, Feb 24 '21 at 16:12
This is what is confusing though. Since in the two step approach, the second step is changing the data type from float to string. Perhaps wrapping it up alongside the rolling method specifically leads to the issue somehow. — agftrading, Feb 25 '21 at 08:36

Gerd · Answer 1 · 2021-03-25T16:27:35.310

Note that the apply function for a Rolling object is different from the apply function for a Series object and I agree with you that this is a bit confusing. In my understanding, the functions applied to rolling windows are typically meant for aggregation of data (such as sum, count etc.).

However, you can convert your rolling windows to a list and apply the function to that list (thanks to this discussion).

So my approach would be:

import numpy as np
import pandas as pd

np.random.seed(1)
number_series = pd.Series(np.random.randint(low=1,high=100,size=100),index=[pd.date_range(start='2000-01-01',freq='W',periods=100)])
number_series = number_series.apply(lambda x: float(x))

def func(s):
    if len(s) > 2:
        if s[-1] > s[-2] > s[-3]:
            return 'High'
        elif s[-1] > s[-2]:
            return 'Medium'
        else:
            return 'Low'
    else:
        return ''

list = [func(window) for window in list(number_series.rolling(5))]
new_series = pd.Series(list, index=number_series.index)

Also note that func needs to handle the first items differently because indices would otherwise be out of bounds.

Good assessment of the issue! I think the solution just requires one line: `new_series = pd.Series(number_series.rolling(5)).apply(func)` — Dustin Michels, Mar 27 '21 at 10:33
@DustinMichels: Very good! However, this creates an entirely new Series and loses the original timestamp index, so you would still have to take care of this. — Gerd, Mar 27 '21 at 20:43

score 1 · Answer 2 · answered Mar 25 '21 at 15:26

One approach is to:

Get the WindowIndexer or the rolling() method.
Apply func returning a string and storing the results as a list
Convert back your results to a series.

import numpy as np
import pandas as pd

np.random.seed(1)
number_series = pd.Series(np.random.randint(low=1,high=100,size=100),index=[pd.date_range(start='2000-01-01',freq='W',periods=100)])
number_series = number_series.apply(lambda x: float(x))

def func(s):
    if (len(s) >= 3) and (s[-1] > s[-2] > s[-3]):
        return 'High'
    elif (len(s) >= 2) and s[-1] > s[-2]:
        return 'Medium'
    else:
        return 'Low'
  
# Step 1: Get the window indexer  
window_indexer = number_series.rolling(5)._get_window_indexer()
start, end = window_indexer.get_window_bounds(num_values=len(number_series))

# Step 2: Apply func
results = [func(number_series.iloc[slice(s, e)]) for s, e in zip(start, end)]   

# Step 3: Get results back to a pandas Series
new_series = pd.Series(results, index=number_series.index)

new_series
>>>
2000-01-02       Low
2000-01-09       Low
2000-01-16    Medium
2000-01-23       Low
2000-01-30    Medium
               ...  
2001-10-28       Low
2001-11-04    Medium
2001-11-11      High
2001-11-18      High
2001-11-25       Low
Length: 100, dtype: object

score 0 · Answer 3 · answered Mar 30 '21 at 15:53

Here's another way using boolean 'or' trick with a list and pd.Series constructor:

import numpy as np
import pandas as pd

np.random.seed(1)
number_series = pd.Series(np.random.randint(low=1,high=100,size=100),index=[pd.date_range(start='2000-01-01',freq='W',periods=100)])
number_series = number_series.apply(lambda x: float(x))

def func(s):
    
    if s[-1] > s[-2] > s[-3]:
        return 'High'
    elif s[-1] > s[-2]:
        return 'Medium'
    else:
        return 'Low'

l = []
new_series = number_series.rolling(5).apply(lambda x: l.append(func(x)) or 0)

pd.Series(l, index=number_series.index[:len(l)])

How to apply a function not returning a numeric value to a pandas rolling Window?

3 Answers3