13

Is there a way to create (open/load) an Iris Cube using a fileobject (binary stream), or alternatively, from a netCDF4 dataset object?

Specifically, I have a file served over a URL, but not by an OpenDAP server; iris.load_cube() & friends fail on this.

I realise that Iris prefers lazy loading, and therefore uses a URI instead of in-memory data, but this is not always feasible.

For a plain netCDF4 Dataset object, I can do the following:

from urllib.request import urlopen
import netCDF4 as nc

url = 'https://crudata.uea.ac.uk/cru/data/temperature/HadCRUT.4.6.0.0.median.nc'
with urlopen(url) as stream:
    ds = nc.Dataset('HadCRUT', memory=stream.read())

So I'm looking to do something similar for an Iris Cube, or read the netCDF4 dataset into a cube, without going through a temporary file on disk. I had hoped something would exist in the Iris functionality, but I have not (yet) been able to find it in the reference documentation.

9769953
  • 10,344
  • 3
  • 26
  • 37

1 Answers1

0

To read .nc files Iris internally uses the same netcdf4-python library that you mention.

This means that in theory you can:

  1. Subclass CFReader overriding it's __init__ method with the only change to line self._dataset = netCDF4.Dataset(self._filename, mode='r')

  2. Either write your own load_cube function (based on this code) that will use your custom CFReader, or you can monkeypatch iris with your customized CFReader.

General idea for monkey-patching:

from urllib.request import urlopen

import iris.fileformats.cf
import netCDF4 as nc


def __patch_CFReader():
    if getattr(iris.fileformats.cf.CFReader, '_HACKY_PATCHED'):
        return

    from iris.fileformats.cf import CFReader

    class CustomCFReader(CFReader):
        _HACKY_PATCHED = True

        def __init__(self, uri, *args, **kwargs):
            # ... other code copied
            with urlopen(url) as stream:
                self._dataset = nc.Dataset('HadCRUT', memory=stream.read())
            # ... other code copied

    iris.fileformats.cf.CFReader = CustomCFReader


__patch_CFReader()

import iris
cube = iris.load_cube('https://crudata.uea.ac.uk/cru/data/temperature/HadCRUT.4.6.0.0.median.nc')

WARNING! Depending on how imports are done in your project, monkey patching may not always work as you first think. So maybe you should prefer to use some library specifically designed for monkeypatching, e.g. gorilla:

https://gorilla.readthedocs.io/en/latest/tutorial.html

# my_patches.py:
from urllib.request import urlopen

import gorilla
import iris.fileformats.cf
import netCDF4 as nc

settings = gorilla.Settings(allow_hit=True)

@gorilla.patch(iris.fileformats.cf.CFReader, settings=settings)
def __init__(self, uri, *args, **kwargs):
    # ... other code copied
    with urlopen(url) as stream:
        self._dataset = nc.Dataset('HadCRUT', memory=stream.read())
    # ... other code copied

# earliest_imported_module.py:
import gorilla
import my_patches

for patch in gorilla.find_patches([my_patches]):
    gorilla.apply(patch)
imposeren
  • 4,142
  • 1
  • 19
  • 27
  • I think you can directly call `iris.fileformats.netcdf.load_cubes`. I think docs discorage direct use of those methods, but if you are going to use monkeypatching, then this is the least of your problems. – imposeren May 15 '19 at 08:38
  • Sadly, having looked at it further in more detail, I think this is also not the way to go: `load_cubes` calls `_load_cube`, which also requires a filename, to read netCDF variables into separate cubes where possible. To do that for an in-memory netCDF dataset requires, at a first glance, overriding various (private) functions in netcdf.py. In which case, overriding `load_cubes` essentially becomes implementing one's own fileformat module. – 9769953 May 21 '19 at 13:42