0

I have data from two different catalogs, I want to use the coordinates to match those two catalogs. The data I have is x1,y1,z1,a1,b1,c1,etc (about half million elements) from catalog 1, and x2,y2,z2,a2,e2,m2,n2,etc (about million elements) from catalog 2. What I am trying to do is first to construct a 2D array which contains (x,y) coordinates, if necessary I will extend to (x,y,z), and compare the 2D arrays to find the same elements.

co1 = np.vstack((x1,y1)).T
co2 = np.vstack((x2,y2)).T

idx1 = np.in1d(co1,co2)   # not working for 2D arrays
idx2 = np.in1d(co2,co1)

np.savetxt('combined_data.txt',np.c_[x1[idx1],y1[idx1],a1[idx1],e2[idx2],n2[idx2]],fmt='%1.4f   %1.4f   %1.4f   %1.4f   %1.4f')

For example, I have the following dataset:

x1 = np.array([1,2,3,4,5])
y1 = np.array([5,4,3,2,1])
x2 = np.array([1,4,6,2,6,4,8,9,3])
y2 = np.array([5,1,5,3,6,2,8,3,3])

(1,5), (3,3), (4,2) are the common coordinates between the two catalogs. Therefore,

idx1 = [Ture, False, True, True, False], idx2 = [True, False, False, False, False, True, False, False, True]. 

But the problem is that np.in1d is a 1D routine, it can not be applied to 2D or 3D arrays. Anyone knows some numpy routines to accomplish this task?

Huanian Zhang
  • 830
  • 3
  • 16
  • 37
  • `scipy.spatial.cKDTree` will give you fast n nearest neighbour lookup... – Benjamin Dec 19 '16 at 02:29
  • Stack with : `xy1 = np.column_stack((x1,y1)); xy2 = np.column_stack((x2,y2))` and then use the approaches listed linked at dup target to get row indices, which when indexed into the array to be searched in should give you the desired o/p. – Divakar Dec 19 '16 at 13:04

1 Answers1

1

Convert both arrays to pandas dataframes:

df1 = pd.DataFrame({"x" : x1, "y" : y1})).reset_index()

merge them:

result = pd.merge(df1, df2, left_on=["x","y"], right_on=["x","y"])
#   index_x  x  y  index_y
#0        0  1  5        0
#1        2  3  3        8
#2        3  4  2        5

and get the indexes:

result[["index_x","index_y"]]
#   index_x  index_y
#0        0        0
#1        2        8
#2        3        5
DYZ
  • 55,249
  • 10
  • 64
  • 93
  • Thanks. Actually I need the index from each array. I tried add `left_index = True` and `right_index = True`, but the `pd.merge` returns some errors. Do you have idea how to fix it? – Huanian Zhang Dec 19 '16 at 03:16
  • When creating the dataframe, add a call to `.reset_index()` at the end. This will copy the current index in a separate column. – DYZ Dec 19 '16 at 03:38
  • I tried, it looks like it returns some error: `ValueError: Big-endian buffer not supported on little-endian compiler`. I already make changes on the coordinates like this `df1 = pd.DataFrame({"x" : np.array(ra_sdss).byteswap().newbyteorder(), "y" : np.array(dec_sdss).byteswap().newbyteorder()}).reset_index()`. I assume this error arises from the memory. – Huanian Zhang Dec 19 '16 at 08:44