17

Is there a way to merge 2 or more netCDF files with the same time dimension but different spatial domains into a single netCDF file? The spatial domains are specified by latitude and longitude coordinates? In the documentation for xarray concat, merge etc., they say that they work with a single dimension

user308827
  • 21,227
  • 87
  • 254
  • 417
  • Can you describe how your data looks and how you would like it to look at the end? Will it look like T, Lat1, Lon1, Lat2, Lon2, Lat3, Lon3 which basically means that you are doing a join on the time dimension? – doodhwala Jul 27 '18 at 02:30

3 Answers3

6

My understanding of your question is that you want to want to open multiple netcdf files which contain different spatial sections of your data, where the overall dataset has been broken down along both lat and lon.

If that's the case, then I'm afraid xarray doesn't support this at the moment, I asked about exactly the same issue on the xarray github here.

The same thing was also asked about on SO here. The concat solution mentioned there will work.

In my case I then wanted to save the concatenated dataset to a single new netcdf file, but using this method ended up loading all the data into memory at once. TO get around this I ended up having to use the netcdf python library to solve this at a lower level, but it took a lot of effort.

ThomasNicholas
  • 1,273
  • 11
  • 21
  • 1
    thanks @Thomas, happy to accept your answer if you can share the code – user308827 Aug 03 '18 at 03:10
  • 1
    @user308827 I have put the code up [here](https://pastebin.com/KwukzdY6) for you, but it's 300 lines of code which are somewhat specific to what I'm doing. A lot of the code is actually for intelligently deciding which fields from the .nc files should be saved where. I also don't really think it's a particularly great way of solving the problem, but it's working for me at the moment. If you want to collaborate on a better solution then I would be open to that. – ThomasNicholas Aug 03 '18 at 17:24
  • 1
    Let me know if you need any of it explained! – ThomasNicholas Aug 03 '18 at 19:03
  • 2
    Also someone just posted a third solution to the [issue thread](https://github.com/pydata/xarray/issues/2159) on the xarray github. – ThomasNicholas Aug 03 '18 at 19:47
  • 1
    @user308827 there is now a much better way to solve this - see my new answer – ThomasNicholas Dec 17 '19 at 15:35
5

xarray now supports multi-dimensional concatenation directly through open_mfdataset.

The documentation on combining data along multiple dimensions is here, but as your question is very similar to this one, I'm going to copy the key parts of my answer here:


You have a 2D concatenation problem: you need to arrange the datasets such that when joined up along x and y, they make a larger dataset which also has dimensions x and y.

As long as len(x) is the same in every file, and len(y) is the same in every file, you should in theory be able to do this in one or two different ways.

1) Using combine='nested'

You can manually specify the order that you need them joined up in. xarray allows you to do this by passing the datasets as a grid, specified as a nested list. In your case, if we had 4 files (named [upper_left, upper_right, lower_left, lower_right]), we would combine them like so:

from xarray import open_mfdataset

grid = [[upper_left, upper_right], 
        [lower_left, lower_right]]

ds = open_mfdataset(grid, concat_dim=['x', 'y'], combine='nested')

We had to tell open_mfdataset which dimensions of the data the rows and colums of the grid corresponded to, so it would know which dimensions to concatenate the data along. That's why we needed to pass concat_dim=['x', 'y'].

2) Using combine='by_coords'

But your data has coordinates in it already - can't xarray just use those to arrange the datasets in the right order? That is what the combine='by_coords' option is for, but unfortunately, it requires 1-dimensional coordinates (also known as dimensional coordinates) to arrange the data. If your files don't have any of those the printout will says Dimensions without coordinates: x, y).

If you can add 1-dimensional coordinates to your files first, then you could use combine='by_coords', then you could just pass a list of all the files in any order, i.e.

ds = open_mfdataset([file1, file2, ...], combine='by_coords')

But otherwise you'll have to use combine='nested'.

ThomasNicholas
  • 1,273
  • 11
  • 21
  • @user308827 there is now a much better way to solve this – ThomasNicholas Dec 17 '19 at 15:34
  • thanks @Thomas, what do you mean by `but unfortunately, it requires 1-dimensional coordinates (also known as dimensional coordinates) to arrange the data`? Does that mean it will not work for 2-dimensional data? – user308827 Dec 18 '19 at 03:09
  • You're fine to combine 2D data, but you'll need to use the `combine='nested'` option and tell it exactly what order the datasets need to be in. All I meant is that you can't do it the fully-automated way (i.e. combine='by_coords') without 1D coordinates in your datasets. – ThomasNicholas Jan 11 '20 at 17:59
  • What are my options if i have different dimension sizes for latitude & longitude and also have different indexes ;)? – till Kadabra Feb 04 '20 at 22:38
  • 1
    @tillKadabra I'm not entirely clear what you mean, but xarray can only help you stitch together a set of rectangles of equal (lat, lon) dimensions side-by-side, or stack a set of rectangles of equal (lat, lon) dimensions on top of one another by creating a new dimension. You can't join up uneven grids of datasets, or have so-called ragged arrays. – ThomasNicholas Feb 05 '20 at 23:16
1

I don't know an "automated" way to do this in python (or R, FORTRAN), only manually reading in the files to a larger array and then writing out that array to a new netcdf file, but there is a more "automated" to do it from the command line using CDO.

If you define a domain description file grid.txt that contains the two (or more) files regions:

gridtype = lonlat
gridsize = 420
xname = lon
xlongname = longitude
xunits = degrees east
yname = lat
ylongname = latitude
yunits = degrees north
xsize = 21
ysize = 20
xfirst = -11.0
xinc = 1
yfirst = -20.0
yinc = 1

and then you "expand" the first file file1.nc to the larger domain and then merge in the contents of both netcdf files:

cdo expand,grid.txt file1.nc large.nc
cdo mergegrid large.nc file1.nc merge1.nc
cdo mergegrid merge1.nc file2.nc final_merge.nc 

I found this soln here: https://code.mpimet.mpg.de/boards/1/topics/26 and have used it when I need to merge 2 or 3 files together. However when I needed to merge many hundred of files together containing e.g. one latitude row of data each, I wrote a manual programme (in R in my case).

ClimateUnboxed
  • 7,106
  • 3
  • 41
  • 86