I have a large collection of coordinates contained within a single astropy coordinate object. I would like to apply a function to each coordinate in parallel and produce an output array of the same shape—but this is slow.
(In my case, the function is a model that takes galactcocentric coordinates and outputs a ‘brightness’ associated with that point in space.)
Illustration:
In [339]: type(data)
Out[339]: astropy.coordinates.builtin_frames.galactocentric.Galactocentric
In [340]: data.shape, data.size # Not that big, really
Out[340]: ((21, 21, 31), 13671)
In [341]: data[0,0,0] # An example of a single coordinate
Out[341]:
<Galactocentric Coordinate (galcen_distance=8.3 kpc, galcen_ra=266d24m18.36s, galcen_dec=-28d56m10.23s, z_sun=27.0 pc, roll=0.0 deg): (rho, phi, z) in (kpc, deg, kpc)
( 8.29995608, 180., 0.027)>
In [342]: func = vectorize(lambda coord: 0) # Dummy function
In [343]: %time func(data).shape
CPU times: user 33.2 s, sys: 88.1 ms, total: 33.3 s
Wall time: 33.4 s
Out[343]: (21, 21, 31)
I suspect that this is slow because, at each iteration, a new coordinate object is being initialized before being passed to the vectorized function (discussion).
A solution might be to convert the coordinate object into a plain numpy array before applying the function, discarding unit information and metadata (since the units are homogenous).
However, I can’t find a way to do that.
How should I approach this? If converting to vanilla numpy data types is the best solution, how is that accomplished?
Thanks!
Minimal working example:
from numpy import *
from astropy import units as u
from astropy.coordinates import Galactocentric
# Generate lots of coordinates
x = linspace(0, 1, 1e3)*u.pc
data = Galactocentric(x=x, y=0*u.pc, z=0*u.pc)
@vectorize
def func(coord):
'''ultimately in terms of coord.x, coord.y, coord.z...'''
return 0
# timeit
func(data)