1

Dears, I need to create a xarray.datarray with the names of the dimensions equal to the names of the coordinates, however, I am not succeeding. Here is the code for reproduction:

import numpy as np
import xarray as xr

data = [[23, 22, 21],
       [22, 20, 24]]

x, y = np.meshgrid([-45, -44, -43], [-21, -20])

t2m = xr.DataArray(data=data,
                   dims=["lon", "lat"],
                   coords=dict(
                   lon=(["lon", "lat"], x),
                   lat=(["lon", "lat"], y)))

With this code I get the following error:

MissingDimensionsError: 'lon' has more than 1-dimension and the same name as one of its dimensions ('lon', 'lat'). xarray disallows such variables because they conflict with the coordinates used to label dimensions.

To create the datarray without this error I would just change the name of the dimensions:

t2m = xr.DataArray(data=data,
               dims=["x", "y"],
               coords=dict(
               lon=(["x", "y"], x),
               lat=(["x", "y"], y)))

However, I would like to use the .sel method to extract values for certain coordinates, and that would only work if the dimension values were equal to the coordinates, for example:

t2.sel(lon=-45, lat=-21, method='nearest')

Could someone help me with this? The netCDF files I download from internet sources (such as Copernicus netCDF files with ERA5 reanalisys data) come with coordinate names equal to dimensions, and dimension values equal to coordinates values, thus allowing use the .sel() method to extract data for a given coordinate (lon ,lat).

Thank you very much in advance.

2 Answers2

3

As we can see in your code, your mesh is regular, so you are able to have 1D coordinates, and so to name them like your dimensions :

import xarray as xr

data = [[23, 22, 21],
        [22, 20, 24]]

lon, lat = [-45, -44, -43], [-21, -20]

t2m = xr.DataArray(data=data,
                   dims=["lat", "lon"],
                   coords=dict(lon=("lon", lon), lat=("lat", lat)))

t2m.sel(lon=-45, lat=-21, method='nearest')

This way, you can use sel.

cyril
  • 524
  • 2
  • 8
1

The short answer is you can't use .sel to select individual elements within multi-dimensional coordinates.

See this question which goes into some possible options. If you have multi-dimensional coordinates lat/lon, it is not at all guaranteed that da.sel(lon=..., lat=...) will return a unique or correct result (note that xarray isn't designed to treat lat/lon as a special geospatial coordinate), so da.sel is not intended for this use case.

You either need to translate your intended (lon, lat) pair into (x, y) space, or mask the data with t2.where((abs(t2.lon - lon) < tol) & (abs(t2.lat - lat) < tol), drop=True) or something of the like.

See the xarray docs on working with MultiDimensional Coordinates for more info.

Michael Delgado
  • 13,789
  • 3
  • 29
  • 54
  • thanks for your reply. I reworked the DataArray by first transforming it into a pandas dataframe, and then defining the lat/lon columns as indices of that dataframe, and then using the to_xarray method to transform it into a xarray.dataset: new_ds = t2m.to_dataframe().set_index(['lon', 'lat']).to_xarray() With this resulting dataset I can use the .sel, as the dimensions are lat/lon values. For my case it works, I'll take a look at the to_xarray source code... – Robson Barreto Sep 20 '21 at 20:18
  • yeah the thing to watch out for is if the lat/lon data are not orthogonal (e.g. you're not on a regular lat/lon grid, e.g. your data is actually on a equal area projection or something) then you'll blow up the dimensionality of your data, because xarray allocates memory for every combination of every dimension. – Michael Delgado Sep 20 '21 at 20:37