Efficiently compute distances between thousands of coordinate pairs

Question

I have a catalog I opened in python, which has about 70,000 rows of data (ra, dec coordinates and object name) for various objects. I also have another list of about 15,000 objects of interest, which also appear in the previously mentioned catalog. For each of these 15,000 objects, I would like to see if any other objects in the large 70,000 list have ra, dec coordinates within 10 arcseconds of the object. If this is found to be true, I'd just like to flag the object and move on to the next one. However, this process takes a long time, since the distances are computed between the current object of interest (out of 15,000) 70,000 different times. This would take days! How could I accomplish the same task more efficiently? Below is my current code, where all_objects is a list of all the 15,000 object names of interest and catalog is the previously mentioned table data for 70,000 objects.

from astropy.coordinates import SkyCoord
from astropy import units as u

for obj_name in all_objects:
    obj_ind = list(catalog['NAME']).index(obj_name)
    c1 = SkyCoord(ra=catalog['RA'][obj_ind]*u.deg, dec=catalog['DEC'][obj_ind]*u.deg, frame='fk5')
    for i in range(len(catalog['NAME'])):
        if i != obj_ind:
            # Compute distance between object and other source
            c2 = SkyCoord(ra=catalog['RA'][i]*u.deg, dec=catalog['DEC'][i]*u.deg, frame='fk5')
            sep = c1.separation(c2)
            contamination_flag = False
            if sep.arcsecond <= 10:
                contamination_flag = True
                print('CONTAMINATION FOUND')
                break

this may be helpful https://stackoverflow.com/questions/27575854/vectorizing-a-function-in-pandas — SuperStew, Feb 14 '19 at 21:55

score 2 · Answer 1 · edited Jun 20 '20 at 09:12

1 Create your own separation function

This step is really easy once you look at the implementation and ask yourself: "how can I make this faster"

def separation(self, other):
    from . import Angle
    from .angle_utilities import angular_separation # I've put that in the code bellow so it is clearer

    if not self.is_equivalent_frame(other):
        try:
            other = other.transform_to(self, merge_attributes=False)
        except TypeError:
            raise TypeError('Can only get separation to another SkyCoord '
                            'or a coordinate frame with data')

    lon1 = self.spherical.lon
    lat1 = self.spherical.lat
    lon2 = other.spherical.lon
    lat2 = other.spherical.lat

    sdlon = np.sin(lon2 - lon1)
    cdlon = np.cos(lon2 - lon1)
    slat1 = np.sin(lat1)
    slat2 = np.sin(lat2)
    clat1 = np.cos(lat1)
    clat2 = np.cos(lat2)

    num1 = clat2 * sdlon
    num2 = clat1 * slat2 - slat1 * clat2 * cdlon
    denominator = slat1 * slat2 + clat1 * clat2 * cdlon

    return Angle(np.arctan2(np.hypot(num1, num2), denominator), unit=u.degree)

It calculates a lot of cosines and sines, then creates an instance of Angle and converts to degrees then you convert to arc seconds.

You might not want to use Angle, nor do the tests and conversions at the beginning, nor doing the import in the function, nor doing so much variable assignment if you need performance.

The separation function feels a bit heavy to me, it should just take numbers and return a number.

2 Use a quad tree (requires a complete rewrite of your code)

That said, let's look at the complexity of your algorithm, it checks every element against every other element, complexity is O(n**2) (Big O notation). Can we do better...

YES You could use a Quad-tree, worst case complexity of Quad tree is O(N). What that basically means if you're not familiar with Big O is that for 15 000 element, the lookup will be 15 000 times what it is for 1 element instead of 225 000 000 times (15 000 squared)... quite an improvement right... Scipy has a great Quad tree library (I've always used my own).

Efficiently compute distances between thousands of coordinate pairs

1 Answers1

1 Create your own separation function

2 Use a quad tree (requires a complete rewrite of your code)