0

Say I have a Dataset like this

nx1, nx2, nx3 = 5, 3, 20

ds = xray.Dataset()
ds.coords.update({'x1': ('x1', range(nx1)), 
                  'x2': ('x2', range(nx2)), 
                  'x3': ('x3', range(nx3))})

ds['A'] = (['x1', 'x2', 'x3'], np.random.randn(nx1, nx2, nx3))
ds['B'] = (['x1', 'x2', 'x3'], np.random.randn(nx1, nx2, nx3))

and a function func that takes as input variables A and B, but it works along the x3 dimension, only takes in arrays of shape (nx3,), and outputs an array of shape (nx3,). If I wanted to apply this function to the above dataset and save the result as a new variable named C, is the following the way to do it,

required_shape = (len(ds.coords['x1']), 
                  len(ds.coords['x2']),
                  len(ds.coords['x3']))

if 'C' not in ds:
    ds.update({'C': (['x1', 'x2', 'x3'], np.zeros(required_shape))})

for ix1, x1 in enumerate(ds.coords['x1']):
    for ix2, x2 in enumerate(ds.coords['x2']):
        args = dict(x1=ix1, x2=ix2)
        a = ds['A'][args]
        b = ds['B'][args]
        c = func(a.values, b.values)
        ds['C'][args] = c

by initialising a new array in the dataset and using for-loops over the other dimensions?

qAp
  • 1,139
  • 2
  • 12
  • 26

1 Answers1

0

I'm not big on pandas, but a general solution for other data types as well would be to use a comprehension instead, and get rid of the nested loops and initialization step.

required_shape = (len(ds.coords['x1']), 
                  len(ds.coords['x2']),
                  len(ds.coords['x3']))

ds['C'] = (['x1', 'x2', 'x3'], np.array([
    func(ds['A'][args].values, ds['B'][args].values)
    for ix1, x1 in enumerate(ds.coords['x1'])
    for ix2, x2 in enumerate(ds.coords['x2'])
    for args in (dict(x1=ix1, x2=ix2),)]).reshape(required_shape))

EDIT: Incidentally,

ds['C'] = func(ds['A'], ds['B'])

seems to work just fine, for a simple function like:

def func(a, b):
    return a + b
Sari
  • 596
  • 7
  • 12