8

I have files which are made of 10 ensembles and 35 time files. One of these files looks like:

>>> xr.open_dataset('ens1/CCSM4_ens1_07ic_19820701-19820731_NPac_Jul.nc')
<xarray.Dataset>
Dimensions:    (ensemble: 1, latitude: 66, longitude: 191, time: 31)
Coordinates:
  * ensemble   (ensemble) int32 1
  * latitude   (latitude) float32 0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 ...
  * longitude  (longitude) float32 100.0 101.0 102.0 103.0 104.0 105.0 106.0 ...
  * time       (time) datetime64[ns] 1982-07-01 1982-07-02 1982-07-03 ...
Data variables:
    u10m       (time, latitude, longitude) float64 -1.471 -0.05933 -1.923 ...
Attributes:
    CDI:                       Climate Data Interface version 1.6.5 (http://c...
    history:                   Wed Nov 22 21:54:08 2017: ncks -O -d longitude...
    Conventions:               CF-1.4
    CDO:                       Climate Data Operators version 1.6.5 (http://c...
    nco_openmp_thread_number:  1
    NCO:                       4.3.7

When I use open_mfdataset the files are concatenated along the time dimension and the ensemble dimension is dropped (possible because it has a size of 1)?

>>> xr.open_mfdataset('ens*/*NPac*.nc')
<xarray.Dataset>
Dimensions:    (latitude: 66, longitude: 191, time: 10850)
Coordinates:
  * latitude   (latitude) float32 0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 ...
  * longitude  (longitude) float32 100.0 101.0 102.0 103.0 104.0 105.0 106.0 ...
  * time       (time) datetime64[ns] 1982-07-01 1982-07-02 1982-07-03 ...
Data variables:
    u10m       (time, latitude, longitude) float64 -1.471 -0.05933 -1.923 ...

I'm not sure if it possible to concat along the ensemble dimension as well?

I did a simple test using merge as given here Error on using xarray open_mfdataset function but it fails:

>>> ds = xr.open_mfdataset('ens1/*NPac*')
<xarray.Dataset>
Dimensions:    (ensemble: 1, latitude: 66, longitude: 191, time: 1085)
Coordinates:
  * ensemble   (ensemble) int32 1
  * latitude   (latitude) float32 0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 ...
  * longitude  (longitude) float32 100.0 101.0 102.0 103.0 104.0 105.0 106.0 ...
  * time       (time) datetime64[ns] 1982-07-01 1982-07-02 1982-07-03 ...
Data variables:
    u10m       (time, latitude, longitude) float64 -1.471 -0.05933 -1.923 ...
>>> ds2 = xr.open_mfdataset('ens2/*NPac*')
<xarray.Dataset>
Dimensions:    (ensemble: 1, latitude: 66, longitude: 191, time: 1085)
Coordinates:
  * ensemble   (ensemble) int32 2
  * latitude   (latitude) float32 0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 ...
  * longitude  (longitude) float32 100.0 101.0 102.0 103.0 104.0 105.0 106.0 ...
  * time       (time) datetime64[ns] 1982-07-01 1982-07-02 1982-07-03 ...
Data variables:
    u10m       (time, latitude, longitude) float64 3.992 2.099 -0.3162 ...
>>> ds3 = xr.merge([ds, ds2])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/nethome/rxb826/local/bin/miniconda3/lib/python3.6/site-packages/xarray/core/merge.py", line 513, in merge
    variables, coord_names, dims = merge_core(dict_like_objects, compat, join)
  File "/nethome/rxb826/local/bin/miniconda3/lib/python3.6/site-packages/xarray/core/merge.py", line 432, in merge_core
    variables = merge_variables(expanded, priority_vars, compat=compat)
  File "/nethome/rxb826/local/bin/miniconda3/lib/python3.6/site-packages/xarray/core/merge.py", line 166, in merge_variables
    merged[name] = unique_variable(name, variables, compat)
  File "/nethome/rxb826/local/bin/miniconda3/lib/python3.6/site-packages/xarray/core/merge.py", line 85, in unique_variable
    % (name, out, var))
xarray.core.merge.MergeError: conflicting values for variable 'u10m' on objects to be combined:
first value: <xarray.Variable (time: 1085, latitude: 66, longitude: 191)>
dask.array<shape=(1085, 66, 191), dtype=float64, chunksize=(31, 66, 191)>
Attributes:
    long_name:  10m U component of wind
    units:      m s**-1
second value: <xarray.Variable (time: 1085, latitude: 66, longitude: 191)>
dask.array<shape=(1085, 66, 191), dtype=float64, chunksize=(31, 66, 191)>
Attributes:
    long_name:  10m U component of wind
    units:      m s**-1

I'm using v0.10.0 (thanks for the recent update!)

Ray Bell
  • 1,508
  • 4
  • 18
  • 45

3 Answers3

12

xarray.open_mfdataset does not support 2d merges. What you will need to do is use concat along the second dimension:

import os
import xarray as xr

ens_list = []
for num in range(1, 11):
     ens = 'ens%d' % num
     ens_list.append(xr.open_mfdataset(os.path.join(ens, '*NPac*')))
ds = xr.concat(ens_list, dim='ensemble')

This is a common problem that xarray users run into. It is quite difficult, however, to write a generalized ND concat routine.

jhamman
  • 5,867
  • 19
  • 39
  • 2
    thanks for the soln @jhamman, how will your soln change is the netCDFs do not have an `ensemble` dimension, and just have `lat`, `lon` and `time`. – user308827 Aug 03 '18 at 03:11
3

I wrote the following function as a workaround for my own use case: https://gist.github.com/jnhansen/fa474a536201561653f60ea33045f4e2

It works with arbitrary dimensions, but currently requires that the same variables exist in each file/dataset.

In my case I have a number of tiles (split along e.g. lat, lon, and time):

ds = auto_merge('data/part*.nc')

This will execute immediately as it returns only a view of the data (just like xarray.open_mfdataset would do).

jhansen
  • 1,096
  • 1
  • 8
  • 17
  • thanks @jhansen, so in the example in this post, does that mean `u10m` should exist in each netCDF file? That seems like a reasonable assumption – user308827 Aug 03 '18 at 19:36
  • 1
    can you put a small example of how to call your code? I will test with my own datasets as well and accept. thanks! – user308827 Aug 03 '18 at 19:37
  • Sure, I added a single line example in my reply. Should be as simple as that! And yes, exactly, the variable `u10m` (or whatever) should exist in each file. I haven't actually tested what happens if it doesn't... – jhansen Aug 04 '18 at 20:10
2

xarray does now support N-D concatenation. As your data has 1-D dimension coordinates, you can simply do

ds = xr.open_mfdataset('ens*/*NPac*.nc', combine='by_coords')

and it should combine them in order automatically! It should even work for the ensemble dimension, as you gave that a coordinate too.

Also see this answer to a very similar question.

ThomasNicholas
  • 1,273
  • 11
  • 21