1

The problem: There is a picture P of a bunch of stars. The computer then has to analyze (after extracting) the stars in P and compare them to data about known stars, to figure out which stars are in the photo and to properly identify them. It is made more complicated because the photo can have an arbitrary rotation and scale applied to it.

In astronomy and astrometry, this is called plate-solving.

It seems graph DBs should provide some natural advantages: Easily express the distance between two stars as an edge property, easily hold the data in multiple data structures, such as KD trees, etc.

Is there anything close to this? Either on point, or related to searching for large pattern matches inside a GDB?

Community
  • 1
  • 1
  • have a look at [Is it possible to make a correlation between an image and a constellation?](http://stackoverflow.com/a/28958816/2521214) – Spektre Jan 12 '16 at 08:13

2 Answers2

0

I don think graph-DBs are useful for this. They may be good at detecting graph patterns, but I suppose the patterns are not quite clear, because depending on the picture quality, there may be any number of other faint stars inbetween. Do you know the distance between the stars (arc-seconds?)?

I suppose you could use a multi-dim index, though. First, pick a star (maybe the brightest, or the one with the highest redshift, or whatever, I'm not an astronomer). Then pick the 2nd and 3rd brightest stars within a given distance. This gives you 6 values: 3 brightnesses, 2 distances (from 1st star), and one angle between 2nd and 3rd. You can feed this into an 6-dimensional index (R-Tree, kd-tree, quadtree or PH-Tree. Then you can do a nearest neighbour search. You could add additional dimensions by using the classification (M-Star, ...) or simply the main colour of each star. Or you can add additional stars and their respective distance to star #1 and their angle relative to star #2.

This should give you a fixed rotation-free search index that can be queried with kNN queries, range queries or window queries.

Also, you may want to check out http://astrometry.net/

Disclaimer: the PH-Tree is my own invention, it works best if you have 100,000+ entries. You also would need to normalize the data first.

TilmannZ
  • 1,784
  • 11
  • 18
  • Thanks for the comment. Very Interesting! A few things: 1-Noise is a problem. There will always be some extra stars and some missing stars. 2-The angular resolution in arc seconds per pixel is known. The problem of "blind plate solving" where the arc seconds per pixel is not known is a larger problem. 3- I'm familiar with astrometry.net. 4 - TIlmanZ - I like your idea. That is very similar to what I have been thinking 5 -However the patter of stars is detected, it is going to be "Fuzzy." That is where I was thinking a graph db might be helpful, although I've never used a gbd in that way before – W. Christopher Moses Jan 14 '16 at 15:41
  • I think graph DBs are more useful if you want to navigate a graph. A graph DB should be able to tell you quickly the shortest path between two nodes, how many node can be reached from a given node with 3 hops or maybe to find out whether set of nodes and edges form several disconnected graphs or just one single graph. – TilmannZ Jan 16 '16 at 13:11
  • Usually graph DBs don't have a notion of 'patterns' (e.g. 3 nodes forming a triangle), at best you can give 'distance' by putting a wight on the edges. But since there is also no notion of (euclidean) space, one 'side' of a triangle may for example be 100x the length of the other two sides combined. Maybe the notion of euclidean space can be added, put I suppose using the solution with the brightest star and neighbours may be much faster. – TilmannZ Jan 16 '16 at 13:11
0

Please check for a shorter explanation in sorting out same star from DAOStarFinder Tables

My basc idea is:

  1. Choose the best quality lists as one reference

  2. Align each frame lists to the reference: by angle match and transformation matrix

  3. Sort the stars near the refrence stars

An example of aligning stars

The whole sorting algorithms I use looks like:

#===================================================================
#
#                       STARS SORTING
#
# This script is used to sorting stars from the table file of each figure.
#
# PROCESS:
# 1) Choose Reference Frame
#    select the best quality frame (contains most star) as a reference,
#    pick up several most luminous star (lum_ref_star_limit) as coordinates referencelum_ref_star_limit
# 2) Align Coordinates
#    compare each frame to the reference stars with the nearist triangle pattern,
#    according to the pattern derive transformation matrix, align all the coordinates in that figure
# 3) Sorting Stars
#    iteratively select the star from the reference frame, search in all the other frames within coordinates tolerence
#
# 
# VERSION: 24 Sep 2020
# AUTHOER: QIANG CHEN chen@camk.edu.pl 
#
# PYTHON ENVIRONMENT REQUIREMENT:
# - pip install astropy
# - pip install --no-deps ccdproc
# - pip install photutils
# alternatively can use (CAMK):
#   source /home/chen/python-chen/chen3.6/bin/activate
#
# REFERENCE:
# - COMPARISION STARS https://nbviewer.jupyter.org/gist/dokeeffe/416e214f134d39db696c7fdac261964b
# - SEARCHING STAR TRIANGLE PATTERNS https://stackoverflow.com/q/43126580/13720066
# - ALIGN POINTS by Homogeneous transformation matrix https://astroalign.readthedocs.io/en/latest/
#===================================================================


def getTriangles(set_X, X_combs):
    """
    Inefficient way of obtaining the lengths of each triangle's side.
    Normalized so that the minimum length is 1.
    """
    triang = []
    for p0, p1, p2 in X_combs:
        d1 = np.sqrt((set_X[p0][0] - set_X[p1][0]) ** 2 +
                     (set_X[p0][1] - set_X[p1][1]) ** 2)
        d2 = np.sqrt((set_X[p0][0] - set_X[p2][0]) ** 2 +
                     (set_X[p0][1] - set_X[p2][1]) ** 2)
        d3 = np.sqrt((set_X[p1][0] - set_X[p2][0]) ** 2 +
                     (set_X[p1][1] - set_X[p2][1]) ** 2)
        d_min = min(d1, d2, d3)
        d_unsort = [d1 / d_min, d2 / d_min, d3 / d_min]
        triang.append(sorted(d_unsort))
    return triang

def sumTriangles(ref_triang, in_triang):
    """
    For each normalized triangle in ref, compare with each normalized triangle
    in B. find the differences between their sides, sum their absolute values,
    and select the two triangles with the smallest sum of absolute differences.
    """
    tr_sum, tr_idx = [], []
    for i, ref_tr in enumerate(ref_triang):
        for j, in_tr in enumerate(in_triang):
            # Absolute value of lengths differences.
            tr_diff = abs(np.array(ref_tr) - np.array(in_tr))
            # Sum the differences
            tr_sum.append(sum(tr_diff))
            tr_idx.append([i, j])
    # Index of the triangles in ref and in with the smallest sum of absolute
    # length differences.
    tr_idx_min = tr_idx[tr_sum.index(min(tr_sum))]
    ref_idx, in_idx = tr_idx_min[0], tr_idx_min[1]
    return ref_idx, in_idx, min(tr_sum)



import os
import glob
import numpy as np
import itertools
import numpy as np
import matplotlib.pyplot as plt
import astroalign as aa

PATH = '/work/chuck/chen/obs'

ALIGN_COOR = False
ALIGN_COOR = True

folder,name,deblending,plot = '20190822','stars',False,False
folder,name,deblending,plot = '20190823','ap',False,False
folder,name,deblending,plot = '20190827','ap',False,False
folder,name,deblending,plot = '20190822','ap',False,False

offset = 10 # better >= 10
len_threshold = 10
lum_ref_star_limit = 10
# >=3, better 5-10, bad figure quality needs bigger, too large in case of slow
in_star_limit = 5 # >=3
flux_ratio = 15 # 
align_accuracy_limit = 5e-3
filters = ['V','B','U','I']

try:
  print('deleting old data')
  os.system(f'rm {PATH}/{folder}/reduced/sorted_{name}_*.dat')
  if(ALIGN_COOR): 
    print('delete align')
    os.system(f'rm {PATH}/{folder}/reduced/aligned_{name}_light_*.dat')
except:
  pass



refs = []
rows = []
datas = {}
align_fails = []
total_files = 0
for filt in filters:
  print('SORTING STARS BY EVERY FILTER TYPE:',filt) 

  if(ALIGN_COOR):
    files = glob.glob(f'{PATH}/{folder}/reduced/{name}_light_*{filt}*.dat')
  else:
    files = glob.glob(f'{PATH}/{folder}/reduced/aligned_{name}_light_*{filt}*.dat')
  total_files+=len(files)

  print('  reading datas and setting most star figure as reference')
  row_ref = 0
  data_ref = np.array([])
  for fil in files:
    f = open(fil,'r')
    next(f) # skip header line
    data = np.array([[float(data) for data in line.split()] for line in f.readlines()])
    f.close()
    datas[fil] = data
    row,col = data.shape
    print(f'    {fil} contains {row} stars, time {data[0,-1]}')
    if (row >= row_ref):
      row_ref = row
      data_ref = data
      ref = fil

  if (data_ref.size):
    print('  sorting reference frame as flux descending')
    if(name=='ap' and deblending):
      data_ref = data_ref[data_ref[:,6].argsort()[::-1]]
    else:
      data_ref = data_ref[data_ref[:,3].argsort()[::-1]]
  else:
    print('  NO RESULTS')
    continue
  refs.append(ref)
  rows.append(row_ref)


  if(ALIGN_COOR):
    print('  Aligning Coordinates')
    for k in range(len(files)):
      k = k
      fil = files[k]
      data = datas[fil]
      row,col = data.shape
      if(row<in_star_limit):
        break
      print(f'  aligning coordinates: {k+1}/{len(files)}\n    reference: {ref}\n    input: {fil}')
      if(name=='ap' and deblending):
        xcentroid = data_ref[:,4]
        ycentroid = data_ref[:,5]
        x = data[:,4]
        y = data[:,5]
      else:
        xcentroid = data_ref[:,1]
        ycentroid = data_ref[:,2]
        x = data[:,1]
        y = data[:,2]

      set_ref = np.array(list(zip(xcentroid[:lum_ref_star_limit],ycentroid[:lum_ref_star_limit])))
      set_in = np.array(list(zip(x,y)))

      # All possible triangles.
      ref_combs = list(itertools.combinations(range(len(set_ref)), 3))
      in_combs = list(itertools.combinations(range(len(set_in)), 3))

      # Obtain normalized triangles.
      ref_triang, in_triang = getTriangles(set_ref, ref_combs), getTriangles(set_in, in_combs)

      # Index of the ref and in triangles with the smallest difference.
      ref_idx, in_idx, accuracy = sumTriangles(ref_triang, in_triang)
      print("    smallest difference, accuracy: {}".format(accuracy))
      if (accuracy>align_accuracy_limit):
        print('    quality too low, reject')
        align_fails.append(fil)
        continue

      # Indexes of points in ref and in of the best match triangles.
      ref_idx_pts, in_idx_pts = ref_combs[ref_idx], in_combs[in_idx]
      print ('    triangle ref %s matches triangle in %s' % (ref_idx_pts, in_idx_pts))

      print ("    ref:", [set_ref[_] for _ in ref_idx_pts])
      print ("    input:", [set_in[_] for _ in in_idx_pts])

      ref_pts = np.array([set_ref[_] for _ in ref_idx_pts])
      in_pts = np.array([set_in[_] for _ in in_idx_pts])

      transf, (in_list,ref_list) = aa.find_transform(in_pts, ref_pts)
      transf_in = transf(set_in)
      print(f'    {transf}')    
      if plot:
        plt.scatter(set_ref[:,0],set_ref[:,1], s=100,marker='.', c='r',label='Reference')
        plt.scatter(set_in[:,0],set_in[:,1], s=100,marker='^', c='b',label='Input')
        plt.scatter(transf_in[:,0],transf_in[:,1], s=200,marker='+', c='b',label='Input Aligned')
        plt.plot(ref_pts[:,0],ref_pts[:,1], c='r')
        plt.plot(in_pts[:,0],in_pts[:,1], c='b')
        plt.legend()
        plt.show()
      datas[fil] = np.append(data,transf_in,axis=1)
      head, tail = os.path.split(fil)
      np.savetxt(f'{head}/aligned_{tail}',datas[fil], fmt='%f ', newline='\n')



  print('  Sorting Stars')
  print(f'  compare to the reference star, search in files within tolerence. Type: {name}')
  ii = 0
  for i in range(row):
    onestar = []
    print(f'    star {i+1}/{row} in reference: {fil}')
    if(name=='ap' and deblending):
      xcentroid = data_ref[i,4]
      ycentroid = data_ref[i,5]
      flux_ref = data_ref[i,6]
    else:
      xcentroid = data_ref[i,1]
      ycentroid = data_ref[i,2]
      flux_ref = data_ref[i,3]
    for k in range(len(files)):
      fil = files[k]
      data = datas[fil]
      #print(f'    sorting: {k+1}/{len(files)}\n      reference: {ref}\n      ori input: {fil}')
      data = datas[fil]
      row,col = data.shape
      for j in range(row):
        if(name=='ap' and deblending):
          x = data[j,-2]
          y = data[j,-1]
          flux = data[j,6]
        else:
          x = data[j,-2]
          y = data[j,-1]
          flux = data[j,3]
        if (xcentroid-offset<x<xcentroid+offset) and (ycentroid-offset<y<ycentroid+offset) and (1./flux_ratio<flux/flux_ref<flux_ratio):  
          onestar.append(data[j,:])
    if(len(onestar)<len_threshold):
      print('      lack data, abort')
      continue
    else:
      ii+=1
      onestar = np.array(onestar)
      print('  sort data as time ascending')
      onestar = onestar[onestar[:,-3].argsort()]
      np.savetxt(f'{PATH}/{folder}/reduced/sorted_{name}_{filt}{ii}.dat', onestar, fmt='%f ', newline='\n')


print('Reference figures with most stars found:')
for k in range(len(refs)):
  ref = refs[k]
  num = rows[k]
  print('  in figure',ref,'found',num,'stars')

for fail in align_fails:
  print('  align fails in', fail)
print(f'align fail total number {len(align_fails)} / {total_files}')

John Chen
  • 173
  • 10