1

I have two dataset that I want to run a running window over and calculate the R^2 value for each window. What I have is:

import numpy as np
from scipy.ndimage.filters import generic_filter
from sklearn.metrics import r2_score

data1 = np.random.randint(1,100,size=(100,50))
data2 = np.random.randint(1,100,size=(100,50))

array = np.ones(np.shape(data1))

    def function(array):
        return r2_score(data1,data2)

window = generic_filter(array,function,footprint=np.ones((3,3)),mode='nearest')

So basically a R^2 value should be calculated for each 3x3 box and that value placed in the center of where that box was like:

0 0 0
0 R^2 0
0 0 0

And then it would move over to the next point and calculate another R^2 and so on for the whole array of size (100x50). When I run this though it creates an array that's 100x50 but it's all the same value. I think it's calculating the R^2 for the entire data1 and data2 instead of each footprint size. I'm not exactly sure what I need to pass into function instead of array.

Ehsan
  • 12,072
  • 2
  • 20
  • 33
Overtime4728
  • 133
  • 8
  • your `function` is not taking into account the kernel produced by `generic_filter`, thus it is using full `data1` and `data2` for every kernel... and I believe there is no way to do this with `generic_filter`, see this [answer](https://stackoverflow.com/a/4947453/6692898) to get sliding windows of an ndarray, maybe `zip` the windows for both inputs, and append r2 to a list on each iteration (possibly reshape at the end) – RichieV Sep 03 '20 at 19:13
  • also take a look at this [question](https://stackoverflow.com/q/1819124/6692898) – RichieV Sep 03 '20 at 19:26

1 Answers1

2

Here is a loop version of it:

from skimage.util import view_as_windows

data1_w = view_as_windows(data1, (3,3))
data2_w = view_as_windows(data2, (3,3))
r, c, _, _ = data1_w.shape
r2 = np.zeros_like(data1[2:,2:])
for i in range(r):
  for j in range(c):
    r2[i,j] = r2_score(data1_w[i,j,:,:],data2_w[i,j,:,:])

I am not familiar with r2_score function. I advise writing that function in array format to avoid looping. data1_w and data2_w gives you the moving windows you need to do array operations.

Ehsan
  • 12,072
  • 2
  • 20
  • 33
  • Thanks for your help! But the output data looks off. The values range from -1882 to -24 but the R^2 should only be from 0 to 1. I know you said you are not familiar with 'r2_score' but do you have any ideas why this is happening? Is this likely from from the 'r2_score' function or something else? – Overtime4728 Sep 03 '20 at 22:42
  • @Overtime4728 according to doc https://scikit-learn.org/stable/modules/generated/sklearn.metrics.r2_score.html can be negative. Check out their implementation. As for dimension, you were right. edited it. – Ehsan Sep 03 '20 at 22:47
  • @Overtime4728 Mind that `r2` should be smaller in size in each dimension by 2 due to edges. Feel free to accept the answer if it resolves your issue. Thank you. You certainly can implement array-based version of it if you now the formula. – Ehsan Sep 03 '20 at 22:51