I have some multidimensional data and was wondering if i should use xarray when speed is one, albeit not highest, of my concerns.
I have a 4D array so it's not so big as to preclude me from using numpy. The coordinates/indices are vital for one dimension but not so for all the others. I'll have to do slight book-keeping but as the primary developer, this is ok for me. For the developers who continue to iterate the code after me, using integer indexing might be more confusing than using a label-based (xarray/pandas) approach. Regardless, i can still use numpy if i document the process well. But i would like to use xarray for readability.
After implementing a solution, i noticed that the operations/indexing below will complete in about 5 seconds on my machine.
for isotope in isotopes:
for height in heights:
for assm in assemblies:
da.loc[dict(power=['NW','NE','SW','SE'],
assembly=assm,
height=height,
isotope=isotope)] = [3,5,1,20]
If i do the same thing in an integer-based approach on xarray, it takes about 2 seconds.
for k,isotope in enumerate(isotopes):
for j,height in enumerate(heights):
for i,assm in enumerate(assemblies):
da[i,[-4,-3,-2,-1],j,k] = [3,5,1,20]
Lastly, i noticed that if i do the same integer-based indexing in numpy, it takes less than half a second
arr = np.zeros((44,10,22,13))
for k,isotope in enumerate(isotopes):
for j,height in enumerate(heights):
for i,assm in enumerate(assemblies):
arr[i,[-4,-3,-2,-1],j,k] = [3,5,1,20]
Speed is not my biggest concern here but if the label-based approach in xarray is more than 8 times slower and the integer-based approach in xarray is 4 times slower than the standard numpy integer-based approach, it's dissuading me from digging deeper into xarray for medium-rank multidimensional data.
Any thoughts, advice, etc?