1

I have been able to read a netcdf file:

import xarray as xr

ds_disk = xr.open_dataset(file_nc_in)

Then I have read a shape file as:

shapefile_path = 'shapefile.shp'
shp_noce       = gpd.read_file(shapefile_path)

This is my projection:

shp_noce.crs
erived Projected CRS: EPSG:32632>
Name: WGS 84 / UTM zone 32N
Axis Info [cartesian]:
- E[east]: Easting (metre)
- N[north]: Northing (metre)
Area of Use:
- name: Between 6°E and 12°E, northern hemisphere between equator and 84°N, onshore and offshore. Algeria. Austria. Cameroon. Denmark. Equatorial Guinea. France. Gabon. Germany. Italy. Libya. Liechtenstein. Monaco. Netherlands. Niger. Nigeria. Norway. Sao Tome and Principe. Svalbard. Sweden. Switzerland. Tunisia. Vatican City State.
- bounds: (6.0, 0.0, 12.0, 84.0)
Coordinate Operation:
- name: UTM zone 32N
- method: Transverse Mercator
Datum: World Geodetic System 1984 ensemble
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich

I would like to follow this post in order to mask my netcdf file.

However, I get the following error:

Unable to allocate 21.9 GiB for an array with shape (14245, 643, 641) and data type float32

Is it possible to work time step by time step? How can I remove nan. I do not think that it is useful to keep all the data stored, even of the cut region.

any suggestions?

thanks

diedro
  • 511
  • 1
  • 3
  • 15
  • 1
    I would strongly recommend you check out the [xarray](https://docs.xarray.dev/en/stable/) package and use that when working with netcdf data. It’smuch easier to use and provides tools for doing computation with labeled indices (think n-dimensional pandas). And then take a look at [regionmask](https://regionmask.readthedocs.io/en/stable/) - it works with xarray and spatial information like shapefiles to do what you’re trying to accomplish. – Michael Delgado Aug 08 '22 at 14:25
  • I find xarray attracting. On the other hand, I find regionmask quite difficult. It seems more relating to plotting with mask then working with mask. My file is indeed quite big. I would like to cut the data that I do not need and the work with the remaining. – diedro Aug 09 '22 at 13:44
  • 1
    Did you look through the second half of the [working with a 3D mask](https://regionmask.readthedocs.io/en/stable/notebooks/mask_3D.html#working-with-a-3d-mask) tutorial? It focuses on cropping and aggregating data – Michael Delgado Aug 09 '22 at 14:44

1 Answers1

0

If all you're trying to do is limit the amount of data you're loading, I would clip the data to a bounding box rather than masking the array. Masking is a complex operation which requires building a mesh of lat/lon values the same size as your array; depending on your array dimensionality the masking operation may be more intensive than simply loading your data. Additionally, once you've done this complex mask, the you'll still have a full array the size of the shape's bounding box - areas outside the shape will have been converted into NaNs but these still take memory.

Working from your problem, it sounds like you have a dataset in a Mercator projection (EPSG:3395) with dims x, y, and a shapefile in lat/lon (WGS84/EPSG:4326) coordinates. I'll convert the shapefile to Mercator and then select the portion of the array falling within the GeoDataFrame's .total_bounds using xr.Dataset.sel:

ds # an xarray.Dataset with dims (x, y, ...)
shapes # a geopandas.GeoDataFrame

shapes_mercator = shapes.to_crs("epsg:3395")
minx, miny, maxx, maxy = shapes_mercator.total_bounds

subset = ds.sel(
    y=((ds.y >= miny) & (ds.y <= maxy)),
    x=((ds.x >= miny) & (ds.x <= maxx))
)

now you should have significantly less data in your dataset. You could check how much data is in each array with ds[varname].nbytes, e.g. check the number of gigabytes with ds[varname].nbytes / 1024 ** 3. If that amount is loadable on your computer (usually you want at least 2-3x the array size to be free on your machine to be able to work on your array), then you can load the data with subset = subset.load()

Michael Delgado
  • 13,789
  • 3
  • 29
  • 54
  • Your solution works almost perfectly. I have, however, some problem with the projection. If I convert the shape file, I get 0 bites in the subset. I supposed the problem is in the selection of the reference system (shapes.to_crs("epsg:3395")). I have tried to update the questions providing some additional information about the shape file. – diedro Aug 11 '22 at 07:58