I am trying to read "n" catalogs/ data files, read 7 columns from each catalog and then check if n*(n-1) "if" statements are true using some of the 7 columns read earlier. If the condition is true, then do some math, else do not do anything.
So for example, if I am comparing two catalogs, then I have 2 "if" statements to test and if I have 3 catalogs then I have 6 "if" statements to check.
Each catalog has roughly 10,000 rows and around 40 columns but their lengths are in general different from each other.
Currently, I have a working code for 3 catalogs where I read the three catalogs as nested for loops and apply my 6 conditions.
Here is an example of my code:
path="xx" #Location of all input files.
cat1 = ascii.read(path + file3, guess=False)
data2 = fits.getdata(path+file2, 1)
cat2 = Table(data2)
cat3 = Table.read(path + 'xyz.tbl', format='ipac')
for i in range(len(cat1)):
(ra1,dec1,flux1,flux1error,maj1,minor1,ang1)= (cat1['RA_Degrees'][i],
cat1['DEC_Degrees'][i],cat1['fitted_total_flux'][i],
cat1['fitted_total_flux_error'][i],cat1['BMajor_Degrees'][i],
cat1['BMinor_Degrees'][i],cat1['position_angle_deg'][i])
ang1=ang1*np.pi/180
for j in range(len(cat2)):
(ra2,dec2,total_cat2,total_error_cat2,maj2,min2,pa2)= (cat2['ra'][j],cat2['dec'][j],
cat2['total'][j],cat2['total_err'][j],
cat2['BMajor'][j],cat2['Bminor'][j],cat2['Position Angle'][j]
for k in range(len(cat3)):
(ra3,dec3,total_cat2,total_error_cat2,maj3,min3,pa3)=(cat3['ra'][k],
cat3['dec'][k],cat3['flux'][k],cat3['ferr'][k],cat3['bmaj'][k],
cat3['bmin'][k],cat3['pa'][k])
if np.all(
np.all(np.abs(ra2-ra1)< maj1+ maj2 and
np.all(np.abs(dec2-dec1)< maj1 + maj2) and
np.all(np.abs(ra3-ra2)< maj2 + maj3) and
np.all(np.abs(dec3-dec2)< maj2 + maj3) and
np.all(np.abs(ra3-ra1)< maj1 + maj3) and
np.all(np.abs(dec3-dec1)< maj1 + maj3)
):
I have two problems related to this:
- I would like to generalize this to any number of catalogs. Currently, I have to edit the code if I have 2,3,4 catalogs which is annoying.
- A 2 catalog match takes up to 33 minutes to execute, but the 3 catalog match code has been currently running for 2 days. Is there any way to speed this up.
For the first problem, I looked up recursive functions in the link given below but my question is can I use this since my number of conditions to be checked also depends on "n" and the column names are generally not homogeneous across catalogs. For example: one catalog may call Right Ascension as 'RA', another catalog may call it as 'ra' or 'Right Ascension'.
For the second problem, I was trying to use multi-processing following the documentation.
https://docs.python.org/2/library/multiprocessing.html
I wanted to know if it is better to stick to nested for loops if I want to do multi processing or try to use recursive function. Any advice would be appreciated.