I am a scientist recently converted from MATLAB to Python. I am looking for ways to structure my (mainly 2D and 3D) datasets. I have searched the net quite a bit, and it seems to me that robust and general-purpose data structuring in Python is still somewhat up in the air. I think this question and any answers will be highly relevant for other Python scientists looking for a way to structure data in a way that allows focusing on the problems at hand rather than the underlying implementation.
One example of the structure of my data is time x altitude x parameter, where parameter is e.g. density, temperature, etc. For the time dimension, I would like to use datetime
objects, since this seems very robust and facilitates easy conversion, formatting, etc.
So far, I've looked into Pandas and MetaArray (from the SciPy cookbook).
Pandas' main drawback as a data type is that it's much more than just that. Each dimension in e.g. a Panel (items, major axis, minor axis) seem to have certain preferred uses, though I can not figure out which. The indexing in particular is different depending on the dimension, and some dimensions may not be expanded after creation of the data structure. Thus, even though some of Pandas' functions like grouping (.groupby
) is really useful for a small part of my work, Pandas is not really intuitive for interactive scientific work, and I find myself looking for other options as my day-to-day data type.
I have also looked briefly into MetaArray from the SciPy cookbook. This looks more like a clean-cut data type, and the indexing seems really intuitive and flexible, making it much more suited to interactive scientific work. However, it is not (AFAIK) part of any package, and needs to be downloaded and installed manually, which makes portability more difficult if I need to collaborate with other scientists. Also, I find almost no examples of it being in use, and thus it seems rather like an ad-hoc solution to the problem of structuring N-dimensional datasets.
I have also heard of Blaze, purported as the "next-generation of NumPy", but as far as I can see that's still very much in early development. (Experiences with Blaze are welcome!)
Thus, I would like some examples (modules, packages, etc.) of how N-dimensional datasets (in particular 3D) may be structured in Python, most importantly in order to easily facilitate interactive use.