1

I have the following xarray DataArray named foo.

<xarray.DataArray (time: 4, lat: 3, lon: 2)>
array([[[0.061686, 0.434164],
        [0.642003, 0.78744 ],
        [0.068701, 0.526546]],

       [[0.53612 , 0.549919],
        [0.172044, 0.118106],
        [0.381638, 0.736584]],

       [[0.688589, 0.173351],
        [0.03593 , 0.833743],
        [0.667719, 0.890957]],

       [[0.712785, 0.04725 ],
        [0.132689, 0.938043],
        [0.681481, 0.67986 ]]])
Coordinates:
  * time     (time) datetime64[ns] 2000-01-01 2000-01-02 2000-01-03 2000-01-04
  * lat      (lat) <U2 'IA' 'IL' 'IN'
  * lon      (lon) <U2 '00' '22'

I need to apply the scipy.stats.percentileofscore function along the time dimension when doing resample by 48 hours.

from scipy import stats
foo.resample(time='48H').reduce(stats.percentileofscore, dim='time', score=0.1)

I received the following error:

\variable.py", line 1354, in reduce
    axis=axis, **kwargs)
TypeError: percentileofscore() got an unexpected keyword argument 'axis'
alextc
  • 3,206
  • 10
  • 63
  • 107

1 Answers1

1

Data for reproduction:

import xarray as xa

array = np.array([[[0.061686, 0.434164],
        [0.642003, 0.78744 ],
        [0.068701, 0.526546]],

       [[0.53612 , 0.549919],
        [0.172044, 0.118106],
        [0.381638, 0.736584]],

       [[0.688589, 0.173351],
        [0.03593 , 0.833743],
        [0.667719, 0.890957]],

       [[0.712785, 0.04725 ],
        [0.132689, 0.938043],
        [0.681481, 0.67986 ]]])

lat = ['IA','IL','IN']
lon = ['00','22']

times = pd.date_range('2000-01-01', periods=4, freq='H') #Hours

foo = xr.DataArray(array, coords=[times, lat, lon], dims=['time', 'lat', 'lon'])

The function you need:

from scipy import stats
import numpy as np

def func(x, axis, score):
    out = np.apply_along_axis(stats.percentileofscore, axis, x, *[score])
    return out

res = foo.resample(time='2H').reduce(func, **{'score':0.2}) #Each 2 hours

The output:

<xarray.DataArray (time: 2, lat: 3, lon: 2)>
array([[[ 50.,   0.],
        [ 50.,  50.],
        [ 50.,   0.]],

       [[  0., 100.],
        [100.,   0.],
        [  0.,   0.]]])
Coordinates:
  * time     (time) datetime64[ns] 2000-01-01 2000-01-01T02:00:00
  * lat      (lat) <U2 'IA' 'IL' 'IN'
  * lon      (lon) <U2 '00' '22'

Explanation

What the function expects and its output:

def func2(x, axis): #Expect a function with axis argument (error reason)
    print(x) #to see the output that our function receive as input
    return x  #not relevant

foo.resample(time='2H').reduce(func2)

#Input of our func2 (two new arrays with shape (2,3,2))
a = np.array([[[0.061686, 0.434164],
  [0.642003, 0.78744 ],
  [0.068701, 0.526546]],

 [[0.53612,  0.549919],
  [0.172044, 0.118106],
  [0.381638, 0.736584]]])

b = np.array([[[0.688589, 0.173351],
  [0.03593,  0.833743],
  [0.667719, 0.890957]],

 [[0.712785, 0.04725 ],
  [0.132689, 0.938043],
  [0.681481, 0.67986 ]]])

So, what you are doing is:

stats.percentileofscore(a, score=0.2) #200  #Reduce over lon and lat
stats.percentileofscore(b, score=0.2) #200  #This raise another error

This is the reason why you need a function that operates through axis (ex. np.mean(a, axis=None, ...)) np.apply_along_axis help us with this task:

#reduce through hours 1-2 and hours 3-4 (axis=0)
np.apply_along_axis(stats.percentileofscore, 0, a, **{'score':0.2}) 
np.apply_along_axis(stats.percentileofscore, 0, b, **{'score':0.2}) 
CamiloEr
  • 1,112
  • 1
  • 9
  • 12
  • Anyone getting this error with this solution? ValueError: The truth value of an array with more than one element is ambiguous. U – e5k Jul 11 '22 at 15:54