0

There are a lot of questions related to ValueErrors where input arrays of different shape cannot be broadcast here at SO. But there are none are related to masks:

ValueError could not broadcast where mask from shape (1) into shape ()

What I'm doing is the following:

  • go through all variables in (the root group of) a NetCDF 4 file
  • copy their content from one NetCDF file into another, newly created one

The reason why is irrelevant for this question, but may be found in this question.

Minimal code:

with netCDF4.Dataset(inputFile) as src, \
            netCDF4.Dataset(outputFile, "w") as dst:
    
    # copy global attributes all at once via dictionary
    dst.setncatts(ingroup.__dict__)

    for variable in src.variables.values():
        # create dimensions first
        for dimName in variable.dimensions:
            dst.createDimension(
                        dimName,
                        (len(dimension)
                            if not dimension.isunlimited() else None))

        # now create variable
        newVar = outgroup.createVariable(
                variable.name,
                variable.datatype,
                variable.dimensions)

        # copy variable attributes all at once via dictionary
        newVar.setncatts(variable.__dict__)

        # copy content
        newVar[:] = variable[:]

This works on newer Pythons (tested with >= 3.6) for all variables, but does not with Python 2.7 for scalar NetCDF variables that are not filled. Within the debugger, right when this exception is raised, the variable of interest looks like (in both Python 2.7 and 3.6):

>>> variable.shape
()
>>> type(variable[:])
<class 'numpy.ma.core.MaskedConstant'>
>>> variable[:].mask
array(True, dtype=bool)
>>> variable[:]
masked
>>> print(variable[:])
--

So this only happens on empty scalar NetCDF variables. Assigning a masked constant to another masked constant on the other hand works. It's just that masked constant inside the netCDF4._netCDF4.Variable .Why? And how to fix?

edit: failure occurs with numpy 1.7.1 and netcdf4 1.2.7. <- this turned out to be the source of the issue, see this answer.


variable is of type netCDF4._netCDF4.Variable, and print(dir(variable)) shows ['__array__', '__class__', '__delattr__', '__delitem__', '__doc__', '__format__', '__getattr__', '__getattribute__', '__getitem__', '__hash__', '__init__', '__len__', '__new__', '__orthogonal_indexing__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__setitem__', '__sizeof__', '__str__', '__subclasshook__', '__unicode__', '_assign_vlen', '_cmptype', '_enumtype', '_get', '_getdims', '_getname', '_grp', '_grpid', '_has_lsd', '_iscompound', '_isenum', '_isprimitive', '_isvlen', '_name', '_nunlimdim', '_put', '_toma', '_varid', '_vltype', 'assignValue', 'chunking', 'datatype', 'delncattr', 'dimensions', 'dtype', 'endian', 'filters', 'getValue', 'get_var_chunk_cache', 'getncattr', 'group', u'long_name', 'mask', 'name', 'ncattrs', 'ndim', 'renameAttribute', 'scale', 'set_auto_mask', 'set_auto_maskandscale', 'set_auto_scale', 'set_var_chunk_cache', 'setncattr', 'setncattr_string', 'setncatts', 'shape', 'size']


Yes I know Python 2 is EOL. But it is needed for [insert reason from legacy dev environment].

hintze
  • 544
  • 2
  • 13

2 Answers2

2

If the documentation is referring to your same version, when the shape is an empty tuple, you could be using either netCDF4.Variable.getValue() / netCDF4.Variable.assignValue() combination, e.g.:

if variable.shape:
    newVar[:] = variable[:]
else:
    newVar.assignValue(variable.getValue())

or newVar[...] = variable[...], e.g.:

slicing = slice(None) if variable.shape else Ellipsis
newVar[slicing] = variable[slicing]
norok2
  • 25,683
  • 4
  • 73
  • 99
  • unfortunately, `getValue()` raises an `IndexError: to retrieve values from a non-scalar variable, use slicing.` At the same time, your slicing proposition raises the same ValueError, just like using [:] does. still, your comment and your answer pointed me in the right direction: testing for the variable shape! therefor, upvote. `newVar[:] = variable[:] if variable.shape else numpy.array([])` is the solution (maintains Python 3 compatibility as well) – hintze Jun 25 '20 at 14:05
  • should I post an answer myself with the solution? – hintze Jun 25 '20 at 14:09
  • I think this is because you tried to use `getValue()` on a non-scalar variable, otherwise I see no reason why the above code should not be working, but your code seems to be losing the value of `variable` if it is a scalar. – norok2 Jun 25 '20 at 14:11
  • @hintze if you think that is going to help future readers, I do not see why not. – norok2 Jun 25 '20 at 14:11
  • You are right, I did not take care enough. Now testing for the shape (`newVar[:] = variable[:] if variable.shape else newVar.assignValue(variable.getValue())`) raises again the original `ValueError` at the very same variable. Strange. – hintze Jun 25 '20 at 14:19
  • @hintze This is because you are doing it incorrectly. It should be: `if variable.shape: newVar[:] = variable[:] else: newVar.assignValue(variable.getValue())` (must be in multiple lines). See edits. – norok2 Jun 25 '20 at 14:22
  • I'm sorry, I tried both. Both fail with the same result (most probably in different ways). The conditional expression was a mistake on my side. Actually, I'm now wondering why the conditional expression is even syntactically allowed, `assignValue` seems to return something. – hintze Jun 25 '20 at 14:36
  • @hintze Every function returns something in Python, if nothing is set explicitly, it sets `None`. – norok2 Jun 25 '20 at 14:54
1

As surfaced through @norok2's answer, neither using getValue() nor slicing with Ellipsis on scalar variables works in this case. Both raises (unexpectedly) the same ValueError on scalar NetCDF variables with Python 2.7. Nevertheless, based on the answer, the following is derived, which fixes the problem by not setting the newly created scalar NetCDF variable.

if isinstance(variable[:], numpy.ma.core.MaskedConstant):
    if variable[:].mask[()]:
        continue
newVar[:] = variable[:]

The result is correct, since the original variable is not copied to newVar only if it is a) scalar and b) masked (unset). Not copying means that newVar remains unset. Which is the same as if it would have been copied.


This issue appears to be version specific. With numpy 1.10.0 up until numpy 1.12.1, the raised exception changes to

IndexError: too many indices for array

Since numpy 1.13.0, this is working perfectly.

This GitHub issue seems to be linked.

hintze
  • 544
  • 2
  • 13