9

Let's say I have an class called Star which has an attribute color. I can get color with star.color.

But what if I have a NumPy array of these Star objects. What is the preferred way of getting an array of the colors?

I can do it with

colors = np.array([s.color for s in stars])

But is this the best way to do it? Would be great if I could just do colors = star.color or colors = star->color etc like in some other languages. Is there an easy way of doing this in numpy?

Sven Marnach
  • 574,206
  • 118
  • 941
  • 841
Dave31415
  • 2,846
  • 4
  • 26
  • 34
  • possible duplicate of [numpy array of objects](http://stackoverflow.com/questions/4877624/numpy-array-of-objects) – YXD Mar 20 '12 at 17:32

3 Answers3

9

The closest thing to what you want is to use a recarray instead of an ndarray of Python objects:

num_stars = 10
dtype = numpy.dtype([('x', float), ('y', float), ('colour', float)])
a = numpy.recarray(num_stars, dtype=dtype)
a.colour = numpy.arange(num_stars)
print a.colour

prints

[ 0.  1.  2.  3.  4.  5.  6.  7.  8.  9.]

Using a NumPy array of Python objects usually is less efficient than using a plain list, while a recarray stores the data in a more efficient format.

Sven Marnach
  • 574,206
  • 118
  • 941
  • 841
  • Cool. So it makes them just like IDL arrays of structures which is what I wanted. How do I use this if I already have a regular python Class defined? Is there a simple way to do that? – Dave31415 Mar 20 '12 at 17:44
  • @Dave31415: IDL? So you are an astronomer, or is anybody *outside* astronomy really using this? As to your question: Without seeing the class definition, this is a bit hard to answer. Using NumPy, you generally don't want "methods" operating on single records, but rather functions that can operate on the whole array at once. So you'd need to vectorise your methods. – Sven Marnach Mar 20 '12 at 17:51
  • Trying to be an ex-astronomer. So I guess what you are saying is that arrays of objects is not the preferred data structure for numpy. But then what is? I can make Classes whose attributes are numpy arrays. Is that the better way? It doesn't sound like what I want. – Dave31415 Mar 20 '12 at 17:58
  • @Dave31415: I'm confused now. What I said is that using a NumPy `recarray` is preferred over an NumPy array of Python objects, at least when using NumPy at all. If a plain list of `Star` instances does the job for you, you might as well go with a plain list. Again, it is hard to give advice without knowing more about your use case. – Sven Marnach Mar 20 '12 at 18:03
  • 1
    Well to start, my classes might not have method. They are basically just like "structures" in C or python. If so, it seems that recarray would work well for this, right? In this case, do you even bother defining classes or go directly to defining the dtype? – Dave31415 Mar 20 '12 at 18:16
  • @Dave31415: You wouldn't define a class in this case, and a `recarray` would work fine. – Sven Marnach Mar 20 '12 at 18:17
4

You could use numpy.fromiter(s.color for s in stars) (note lack of square brackets). That will avoid creating the intermediate list, which I imagine you might care about if you are using numpy.

(Thanks to @SvenMarnach and @DSM for their corrections below).

Marcin
  • 48,559
  • 18
  • 128
  • 201
  • 2
    Unfortunately that won't work: you'll get something like `array( at 0x9cff34c>, dtype=object)`. (I once had a bug in my code that was ultimately due to the fact I thought this would work.) – DSM Mar 20 '12 at 17:58
  • 1
    You'd need to use `numpy.fromiter()` for this. – Sven Marnach Mar 20 '12 at 18:00
  • 3
    Note: to get that to work in recent numpys, you need `numpy.fromiter((s.color for s in stars), float)`. Also, adding `count=len(stars)` will make it more efficient for long arrays. – Danica Sep 02 '13 at 16:02
0

In case star is a more complicated class, here is an approach to get and set the attributes with a helper class on top.

import numpy as np

class star:
    def __init__(self, mass=1, radius=1):
        self.mass = mass
        self.radius = radius

class Stars(list):

    __getattr__ = lambda self, attr: np.array([getattr(s, attr) for s in self])

    def __setattr__(self, attr, vals):
        if hasattr(vals, '__len__'):
            [s.__setattr__(attr, val) for (s,val) in zip(self,vals)]
        else:
            [s.__setattr__(attr, vals) for s in self]


s1 = star(1, 1.1)
s2 = star(2, 3)

S = Stars([s1, s2])

print(S.mass)
print(S.radius)

S.density = S.mass / S.radius**3
print(S.density)
print(s1.density)

Of course, if the class can be reimplemented into a recarray, it should be more efficient. Yet, such a reimplementaion might be undesirable.

Note, outer computations, like the density calculation, are still vectorised. And often those could be bottleneck, rather than setting and getting attributes.