5

Retrieving the order of key-word arguments passed via **kwargs would be extremely useful in the particular project I am working on. It is about making a kind of n-d numpy array with meaningful dimensions (right now called dimarray), particularly useful for geophysical data handling.

For now say we have:

import numpy as np
from dimarray import Dimarray   # the handy class I am programming

def make_data(nlat, nlon):
    """ generate some example data
    """
    values = np.random.randn(nlat, nlon)
    lon = np.linspace(-180,180,nlon)
    lat = np.linspace(-90,90,nlat)
    return lon, lat, values

What works:

>>> lon, lat, values = make_data(180,360)
>>> a = Dimarray(values, lat=lat, lon=lon)
>>> print a.lon[0], a.lat[0]
-180.0 -90.0

What does not:

>>> lon, lat, data = make_data(180,180) # square, no shape checking possible !
>>> a = Dimarray(values, lat=lat, lon=lon)
>>> print a.lon[0], a.lat[0] # is random 
-90.0, -180.0  # could be (actually I raise an error in such ambiguous cases)

The reason is that Dimarray's __init__ method's signature is (values, **kwargs) and since kwargs is an unordered dictionary (dict) the best it can do is check against the shape of values.

Of course, I want it to work for any kind of dimensions:

a = Dimarray(values, x1=.., x2=...,x3=...)

so it has to be hard coded with **kwargs The chances of ambiguous cases occurring increases with the number of dimensions. There are ways around that, for instance with a signature (values, axes, names, **kwargs) it is possible to do:

a = Dimarray(values, [lat, lon], ["lat","lon"]) 

but this syntax is cumbersome for interactive use (ipython), since I would like this package to really be a part of my (and others !!) daily use of python, as an actual replacement of numpy arrays in geophysics.

I would be VERY interested in a way around that. The best I can think of right now is to use inspect module's stack method to parse the caller's statement:

import inspect
def f(**kwargs):
    print inspect.stack()[1][4]
    return tuple([kwargs[k] for k in kwargs])

>>> print f(lon=360, lat=180)
[u'print f(lon=360, lat=180)\n']
(180, 360)

>>> print f(lat=180, lon=360)
[u'print f(lat=180, lon=360)\n']
(180, 360)

One could work something out from that, but there are unsolvable issues since stack() catches everything on the line:

>>> print (f(lon=360, lat=180), f(lat=180, lon=360))
[u'print (f(lon=360, lat=180), f(lat=180, lon=360))\n']
[u'print (f(lon=360, lat=180), f(lat=180, lon=360))\n']
((180, 360), (180, 360))

Is there any other inspect trick I am not aware of, which could solve this problem ? (I am not familiar with this module) I would imagine getting the piece of code which is right between the brackets lon=360, lat=180 should be something feasible, no??

So I have the feeling for the first time in python to hit a hard wall in term of doing something which is theoretically feasible based on all available information (the ordering provided by the user IS valuable information !!!).

I read interesting suggestions by Nick there: https://mail.python.org/pipermail/python-ideas/2011-January/009054.html and was wondering whether this idea has moved forward somehow?

I see why it is not desirable to have an ordered **kwargs in general, but a patch for these rare cases would be neat. Anyone aware of a reliable hack?

NOTE: this is not about pandas, I am actually trying to develop a light-weight alternative for it, whose usage remains very close to numpy. Will soon post the gitHub link.

EDIT: Note I this is relevant for interactive use of dimarray. The dual syntax is needed anyway.

EDIT2: I also see counter arguments that knowing the data is not ordered could also be seen as valuable information, since it leaves Dimarray the freedom to check values shape and adjust the order automatically. It could even be that not remembering the dimension of the data occurs more often than having the same size for two dimensions. So right now, I guess it is fine to raise an error for ambiguous cases, asking the user to provide the names argument. Nevertheless, it would be neat to have the freedom to make that kind of choices (how Dimarray class should behave), instead of being constrained by a missing feature of python.

EDIT 3, SOLUTIONS: after the suggestion of kazagistar:

I did not mention that there are other optional attribute parameters such as name="" and units="", and a couple of other parameters related to slicing, so the *args construct would need to come with keyword name testing on kwargs.

In summary, there are many possibilities:

*Choice a: keep current syntax

a = Dimarray(values, lon=mylon, lat=mylat, name="myarray")
a = Dimarray(values, [mylat, mylon], ["lat", "lon"], name="myarray")

*Choice b: kazagistar's 2nd suggestion, dropping axis definition via **kwargs

a = Dimarray(values, ("lat", mylat), ("lon",mylon), name="myarray")

*Choice c: kazagistar's 2nd suggestion, with optional axis definition via **kwargs (note this involves names= to be extracted from **kwargs, see background below)

a = Dimarray(values, lon=mylon, lat=mylat, name="myarray")
a = Dimarray(values, ("lat", mylat), ("lon",mylon), name="myarray")

*Choice d: kazagistar's 3nd suggestion, with optional axis definition via **kwargs

a = Dimarray(values, lon=mylon, lat=mylat, name="myarray")
a = Dimarray(values, [("lat", mylat), ("lon",mylon)], name="myarray")

Hmm, it comes down to aesthetics, and to some design questions (Is lazy ordering an important feature in interactive mode?). I am hesitating between b) and c). I am not sure the **kwargs really brings something. Ironically enough, what I started to criticize became a feature when thinking more about it...

Thanks very much for the answers. I will mark the question as answered, but you are most welcome to vote for a), b) c) or d) !

=====================

EDIT 4 : better solution: choice a) !!, but adding a from_tuples class method. The reason for that is to allow one more degree of freedom. If the axis names are not provided, they will be generated automatically as "x0", "x1" etc... To use really just like pandas, but with axis naming. This also avoids mixing up axes and attributes into **kwargs, and leaving it only for the axes. There will be more soon as soon as I am done with the doc.

a = Dimarray(values, lon=mylon, lat=mylat, name="myarray")
a = Dimarray(values, [mylat, mylon], ["lat", "lon"], name="myarray")
a = Dimarray.from_tuples(values, ("lat", mylat), ("lon",mylon), name="myarray")

EDIT 5 : more pythonic solution? : similar to EDIT 4 above in term of the user api, but via a wrapper dimarray, while being very strict with how Dimarray is instantiated. This is also in the spirit of what kazagistar proposed.

 from dimarray import dimarray, Dimarray 

 a = dimarray(values, lon=mylon, lat=mylat, name="myarray") # error if lon and lat have same size
 b = dimarray(values, [("lat", mylat), ("lon",mylon)], name="myarray")
 c = dimarray(values, [mylat, mylon, ...], ['lat','lon',...], name="myarray")
 d = dimarray(values, [mylat, mylon, ...], name="myarray2")

And from the class itself:

 e = Dimarray.from_dict(values, lon=mylon, lat=mylat) # error if lon and lat have same size
 e.set(name="myarray", inplace=True)
 f = Dimarray.from_tuples(values, ("lat", mylat), ("lon",mylon), name="myarray")
 g = Dimarray.from_list(values, [mylat, mylon, ...], ['lat','lon',...], name="myarray")
 h = Dimarray.from_list(values, [mylat, mylon, ...], name="myarray")

In the cases d) and h) axes are automatically named "x0", "x1", and so on, unless mylat, mylon actually belong to the Axis class (which I do not mention in this post, but Axes and Axis do their job, to build axes and deal with indexing).

Explanations:

class Dimarray(object):
    """ ndarray with meaningful dimensions and clean interface
    """
    def __init__(self, values, axes, **kwargs):
        assert isinstance(axes, Axes), "axes must be an instance of Axes"
        self.values = values
        self.axes = axes
        self.__dict__.update(kwargs)

    @classmethod
    def from_tuples(cls, values, *args, **kwargs):
        axes = Axes.from_tuples(*args)
        return cls(values, axes)

    @classmethod
    def from_list(cls, values, axes, names=None, **kwargs):
        if names is None:
            names = ["x{}".format(i) for i in range(len(axes))]
        return cls.from_tuples(values, *zip(axes, names), **kwargs)

    @classmethod
    def from_dict(cls, values, names=None,**kwargs):
        axes = Axes.from_dict(shape=values.shape, names=names, **kwargs)
        # with necessary assert statements in the above
        return cls(values, axes)

Here is the trick (schematically):

def dimarray(values, axes=None, names=None, name=..,units=..., **kwargs):
    """ my wrapper with all fancy options
    """
    if len(kwargs) > 0:
        new = Dimarray.from_dict(values, axes, **kwargs) 

    elif axes[0] is tuple:
        new = Dimarray.from_tuples(values, *axes, **kwargs) 

    else:
        new = Dimarray.from_list(values, axes, names=names, **kwargs) 

    # reserved attributes
    new.set(name=name, units=units, ..., inplace=True) 

    return new

The only thing we loose is indeed *args syntax, which could not accommodate for so many options. But that's fine.

And its make it easy for sub-classing, too. How does it sound to the python experts here?

(this whole discussion could be split in two parts really)

=====================

A bit of background (EDIT: in part outdated, for cases a), b), c), d) only), just in case you are interested:

*Choice a involves:

def __init__(self, values, axes=None, names=None, units="",name="",..., **kwargs):
    """ schematic representation of Dimarray's init method
    """
    # automatic ordering according to values' shape (unless names is also provided)
    # the user is allowed to forget about the exact shape of the array
    if len(kwargs) > 0:
        axes = Axes.from_dict(shape=values.shape, names=names, **kwargs)

    # otherwise initialize from list
    # exact ordering + more freedom in axis naming 
    else:
        axes = Axes.from_list(axes, names)

    ...  # check consistency

    self.values = values
    self.axes = axes
    self.name = name
    self.units = units         

*Choices b) and c) impose:

def __init__(self, values, *args, **kwargs):
    ...

b) all attributes are naturally passed via kwargs, with self.__dict__.update(kwargs). This is clean.

c) Need to filter key-word arguments:

def __init__(self, values, *args, **kwargs):
   """ most flexible for interactive use
   """
   # filter out known attributes
   default_attrs = {'name':'', 'units':'', ...} 
   for k in kwargs:
       if k in 'name', 'units', ...:
           setattr(self, k) = kwargs.pop(k)
       else:
           setattr(self, k) = default_attrs[k]

   # same as before
   if len(kwargs) > 0:
       axes = Axes.from_dict(shape=values.shape, names=names, **kwargs)

   # same, just unzip
   else:
       names, numpy_axes = zip(*args)
       axes = Axes.from_list(numpy_axes, names)

This is actually quite nice and handy, the only (minor) drawback is that default parameters for name="", units="" and some other more relevant parameters are not accessible by inspection or completion.

*Choice d: clear __init__

def __init__(self, values, axes, name="", units="", ..., **kwaxes)

But is a bit verbose indeed.

==========

EDIT, FYI: I ended up using a list of tuples for the axes parameter, or alternatively the parameters dims= and labels= for axis name and axis values, respectively. The related project dimarray is on github. Thanks again at kazagistar.

Mahé
  • 445
  • 4
  • 9
  • 1
    what would be order if it's called as `d = {}; d['lat']=180; d['lon']=360; f(**d)`? – alko Dec 01 '13 at 18:06
  • If order matters, the caller must communicate the desired order in some fashion. A kwargs-style signature does not serve that purpose, so it's the wrong interface. The caller is going to have to type a few extra keystrokes ... unless you do something crazy like make `Dimarray` objects callable: `a = Dimarray(values)(lat = lat)(lon = lon)`. Same N of characters as a the kwargs-style -- w00t! ;) – FMc Dec 01 '13 at 18:34
  • I will definitely avoid this :) ! – Mahé Dec 01 '13 at 19:06
  • 1
    @alko: d is unordered, and **d destroyed the order anyway even with OrderedDict. – Mahé Dec 01 '13 at 19:09

2 Answers2

4

No, you cannot know the order in which items were added to a dictionary, since doing this increases the complexity of implementing the dicionary significantly. (For when you really really need this, collections.OrderedDict has you covered).

However, have you considered some basic alternative syntax? For example:

a = Dimarray(values, 'lat', lat, 'lon', lon)

or (probably the best option)

a = Dimarray(values, ('lat', lat), ('lon', lon))

or (most explicit)

a = Dimarray(values, [('lat', lat), ('lon', lon)])

At some level though, that need ordering are inherently positional. **kwargs is often abused for labeling, but argument name generally shouldn't be "data", since it is a pain to set programatically. Just make the two parts of the data that are associated clear with a tuple, and use a list to make the ordering preserved, and provide strong assertions + error messages to make it clear when the input is invalid and why.

kazagistar
  • 1,537
  • 8
  • 20
  • This is a good suggestion, thanks. For some reason I had not considered that. It is indeed much easier to type than the (axes=..., names=...) alternative, especially for high dimensional cases. It is easy to type/read and is also complementary to **kwargs features (see EDIT 2: automatic shape checking). I need to see how this fits with the rest. Writing an EDIT 3 to sum things up. – Mahé Dec 01 '13 at 19:28
  • In my personal coding style, I try to never mix semantic and non-semantic parameters for any given function. In this case, the semantics of the constructor are "give me the values, and the shape of the values", with the shape being an ordered list of label + integer pairs. Each semantically unique item is a single value. With that, I will propose one more solution, though it might be too verbose for your tastes: `a = Dimarray(values, Shape(('lat', lat), ('lon', lon)))` where Shape is a class that validates a particular shape. – kazagistar Dec 01 '13 at 19:52
  • I see your point, but I am really trying to make something easy to use for anyone without having to know which class does what. I do have an Axes class and an Axis class, themselves with labels etc...But I would like the api to be made of just a few names to remember, at best only one (`Dimarray`, plus actually `read`) and the usage to be straightforward without too many {[('". – Mahé Dec 01 '13 at 21:05
  • Actually I forgot to mention than the shape of the values is already contained in values (a multi-dimensional numpy array, via its shape attribue). What is missing is really the `name` and actual values of the axes (and their order in the ambiguous case I mentioned). I do not intend to force shaping a list, really, the starting point is a numpy array. – Mahé Dec 01 '13 at 21:09
  • So really there are two cases: a) every axis has a different size, in that an ordered list of dimensions if a redundant information (just forces the user to know the underlying structure of the data, but part of the reason why I started programming this was also not to worry about that, and just do slicing like a.xs(time=1950) or a.mean(axis='time') instead of having to think about the acutal position of the axis as I have to do with numpy or pandas. – Mahé Dec 01 '13 at 21:19
  • I think it can become a really cool package, I would like to think things through before I publish it. The syntax I mentioned with xs ("cross-section") slicing is also consistent with the **kwargs choice. Right now I feel choice c) is the best answer. – Mahé Dec 01 '13 at 21:21
  • and I forgot from above: b) at least two axes have the same size, the order ("meaningful shape"?) information if required. I am not sure I understand what you mean with non-semantic parameters: it gets confusing because both the main 2-D variable `a` and its axes `lat` and `lon` have `values` and `name` attributes. The axes do not contain the actual shape (a tuple, ordered) but need a bind to `a`. Could you develop a bit more what you mean with semantic and non-semantic? – Mahé Dec 01 '13 at 21:33
  • Sorry, I was misunderstanding what the parameters were. I assumed the values were the shape... instead, they are the "labels" for the items in the labeled dimention. – kazagistar Dec 01 '13 at 21:40
  • Semantic parameters are those which have a specific and unique meaning attached to each one. For example a "name" or an "age". – kazagistar Dec 01 '13 at 21:44
  • Exactly. Now I had one more thought, which makes my life easier given what is already there: I might reserve the possibility to automatically name the axes axes as x0, x1, etc... if the axis names are not provided. To really make it a user-friendly class. So I will go for choice a) and add a from_tuples() option to accept tuples as you suggested. – Mahé Dec 01 '13 at 21:47
  • **kwargs is useful for passing along parameters to subfunctions, but causes problems if you abuse it. For example, what if I want to label one of my axis "values"? The reason the first two options are problematic is that you break the ability for people to use explicit parameter names by having variable arguments. Sometimes, I want to write `a = Dimarray(values=myvalues, ...)` but that will cause errors (at least in python2) because varargs are not allowed after keyword arguments. Wheras `a = Dimarray(values=myvalues, labels=[('lat', lat), ('lon', lon)])` will always work. – kazagistar Dec 01 '13 at 21:57
  • I do not see the problem. If you happen to want to name the axes with "reserved" keywords (at least for initialization), then you should go for a list or tuples initialization which allows you to do that without conflict. The spirit of the Dimarray class is to use keyword arguments only for the axes, or possibly for the attributes, but not for values as it is the core information and should always be first. The second core information is the actual values (or index) of the axes (like in pandas), which should naturally be second. Then come axes names, and finally other attributes. – Mahé Dec 01 '13 at 22:06
  • This choice of having everything in the __init__ might sounds confusing, but as you see in the background code it is still ok, and importantly there is always a non-ambiguous possibility of doing things, via (tuples or lists), and in case some reserved attributes need to be used, one can always set these afterwards (set method, direct access to a.axes[0].name etc... (everything being dynamic). I will post package and documentations soon. – Mahé Dec 01 '13 at 22:10
  • "The spirit of the Dimarray class is to use keyword arguments only for the axes, or possibly for the attributes, but not for values as it is the core information and should always be first." Unfortunately, this seems to directly contrast the "spirit of python" which is to to allow explicit rather then implict parameter names. – kazagistar Dec 01 '13 at 22:24
  • This is certainly not my intention ;) The statement you quoted was maybe not the most inspired of this discussion. I reality, I think I am not so far from that with option a. `__init__(self, values, axes=None, names=None, units="",name="",descr="",dtype=None, slicing="exact", **kwaxes)` is the actual signature (at present). Most parameters are named, part from **kwaxes (sic), but since it symbolizes an unknown number of axes it really does reflect the nature of the problem. – Mahé Dec 01 '13 at 23:28
  • My misstatement is possibly a misunderstanding of your previous comment `Dimarray(values=myvalues, ...)` etc... I mean I do not see how it applies to the case I present. `values` does have a name, and even corresponds to the attribute name where this variable is stored. If you really want to call it with name, then you indeed have to enter `axes=myaxes, names=mynames` (or just omit `name=mynames` for automatic naming or if axes is already an Axes object). I think it is pretty robust. Or just omit everything for quick typing. Interactivity also becomes increasingly relevant! – Mahé Dec 01 '13 at 23:32
  • There would be more to say, but I suggest waiting for me to finish the doc and post the package, then it will be possible to open a broader discussion based on experience with the beta version. – Mahé Dec 01 '13 at 23:48
  • See above my EDIT 5, have I found the Graal ? Or do I miss something important? – Mahé Dec 02 '13 at 01:39
  • FYI, that's the project as of today (list of tuples is what I retained): https://github.com/perrette/dimarray – Mahé Mar 25 '14 at 11:28
1

There is module especially made to handle this :

https://github.com/claylabs/ordered-keyword-args

without using module

def multiple_kwarguments(first , **lotsofothers):
    print first

    for i,other in lotsofothers.items():
         print other
    return True

multiple_kwarguments("first", second="second", third="third" ,fourth="fourth" ,fifth="fifth")

output:

first
second
fifth
fourth
third

On using orderedkwargs module

from orderedkwargs import ordered kwargs  
@orderedkwargs  
def mutliple_kwarguments(first , *lotsofothers):
    print first

    for i, other in lotsofothers:
        print other
    return True


mutliple_kwarguments("first", second="second", third="third" ,fourth="fourth" ,fifth="fifth")

Output:

first
second
third
fourth
fifth

Note: Single asterik is required while using this module with decorator above the function.

Berci
  • 544
  • 1
  • 7
  • 10
igauravsehrawat
  • 3,696
  • 3
  • 33
  • 46