1

I have generic DataFrame with float numbers and no NaN's or Inf's. I want to compute the rolling Z-Score over the column Values and took help of Scipy's z-score.

This works, but it's computing Z-Score over the whole column i.e. not rolling:

from scipy.stats import zscore
df['Z-Score'] = zscore(df['Values'])

This is what I want to do but it's giving me an error:

from scipy.stats import zscore
window_size = 5
df['Z-Score'] = df['Values'].rolling(window_size).apply(lambda s: zscore(s))

I get TypeError: cannot convert the series to <class 'float'>.

I've searched over and over but can't find what the issue is. What am I doing wrong?


I know I can implement the zscore function myself which is more performant but I'd rather use a library.

Nermin
  • 749
  • 7
  • 17
  • @GoldenLion I'm not sure if I understand your comment completely. Would you mind posting an answer with the suggested code changes? – Nermin Apr 08 '23 at 07:07
  • what is your data? z-scores are produced from variance, so the first row will not have a z-score – Golden Lion Apr 10 '23 at 17:21

1 Answers1

1

Pandas' Rolling.apply() expects a function that outputs a scalar. From the docs:

func: function Must produce a single value from an ndarray input if raw=True or a single value from a Series if raw=False. Can also accept a Numba JIT function with engine='numba' specified.

You need to rethink what you really want this computation to do. What exactly is the Z-score of a window of your data? Do you want an average Z-score? Do you want a Z-score based on the distribution of your entire data or just the window? As it is, I can't really understand what you are trying to do.

foglerit
  • 7,792
  • 8
  • 44
  • 64
  • There is legitimate reason for having a rolling Z-score as shown here: https://stackoverflow.com/questions/47164950/compute-rolling-z-score-in-pandas-dataframe https://stackoverflow.com/questions/59596912/rolling-z-score-applied-to-pandas-dataframe. The only difference is that instead of implementing my own manual Z-score function, I'd like to use Scipy's Z-score function instead. Is there any way that's possible? – Nermin Apr 10 '23 at 12:23