Compare two numpy arrays by first Column and create a third numpy array by concatenating two arrays

Question

I have two 2d numpy arrays which is used to plot simulation results.

The first column of both arrays a and b contains the time intervals and the second column contains the data to be plotted. The two arrays have different shapes a(500,2) b(600,2). I want to compare these two numpy arrays by first column and create a third array with matches found on the first column of a. If no match is found add 0 to third column.

Is there any numpy trick to do this?

For instance:

a=[[0.002,0.998],  
  [0.004,0.997],   
  [0.006,0.996],   
  [0.008,0.995],   
  [0.010,0.993]]   

b= [[0.002,0.666],  
    [0.004,0.665],  
    [0.0041,0.664], 
    [0.0042,0.664], 
    [0.0043,0.664], 
    [0.0044,0.663], 
    [0.0045,0.663], 
    [0.0005,0.663], 
    [0.006,0.663], 
    [0.0061,0.662],
    [0.008,0.661]]

expected output

c= [[0.002,0.998,0.666],       
    [0.004,0.997,0.665],           
    [0.006,0.996,0.663],           
    [0.008,0.995,0.661],
    [0.010,0.993, 0   ]]

Can you put `a` and `b` on different lines to make it easy to copy the data into IPython? — Ahmed Fasih, Sep 01 '14 at 07:33
`a` has 6e-3 while `b` has 6e-4, while `c` has 6e-3 again, in their first columns. Is this a data entry error? — Ahmed Fasih, Sep 01 '14 at 07:42
c is the expected output, a and b are two input, the first column of a and b are time intervals like [0.002, 0.004] and the third array c must contains matched time intervals of a and b with their datas in the second column — arun, Sep 01 '14 at 07:45
`a[2,0]` is approx _6*10**-3_. But `b[9,0]` is _6*10**-4_. Factor of 10 different. I think you have a data entry error. — Ahmed Fasih, Sep 01 '14 at 07:49
that is the sample i am getting in the time interval, but you can ignore that, but you can try with a different example and change the datas which does not give any data entry error and get the third numpy array — arun, Sep 01 '14 at 07:55
how do you compare first index of `a` with `b` and insert second index? in your `c` array 0.002 and 0.004 have a same in b but 0.006 is not equal 0.0006 !!!!!!!!???? — Mazdak, Sep 01 '14 at 08:01
Some advice: make it _easy_ for people to answer your question. Include _complete_ and _correct_ test cases in a way that's _easy_ to work with. Don't ask people for help and then ask them to "change the datas" and figure out what exactly you want, although in this case it is obvious enough. Some day soon, you'll be answering questions on this site, so ask questions in a way that you will find easy to answer. — Ahmed Fasih, Sep 01 '14 at 08:04

score 2 · Answer 1 · answered Sep 01 '14 at 07:51

I can quickly think of the solution as

import numpy as np

a = np.array([[0.002, 0.998],
     [0.004, 0.997],
     [0.006, 0.996],
     [0.008, 0.995],
     [0.010, 0.993]])

b = np.array([[0.002, 0.666],
     [0.004, 0.665],
     [0.0041, 0.664],
     [0.0042, 0.664],
     [0.0043, 0.664],
     [0.0044, 0.663],
     [0.0045, 0.663],
     [0.0005, 0.663],
     [0.0006, 0.663],
     [0.00061, 0.662],
     [0.0008, 0.661]])


c = []
for row in a:
    index = np.where(b[:,0] == row[0])[0]
    if np.size(index) != 0:
      c.append([row[0], row[1], b[index[0], 1]])
    else:
      c.append([row[0], row[1], 0])

print c

As pointed out in the comments above, there seems to be a data entry error

Ahmed Fasih · Answer 2 · 2014-09-01T08:13:07.430

import numpy as np
i = np.intersect1d(a[:,0], b[:,0])
overlap = np.vstack([i, a[np.in1d(a[:,0], i), 1], b[np.in1d(b[:,0], i), 1]]).T
underlap = np.setdiff1d(a[:,0], b[:,0])
underlap = np.vstack([underlap, a[np.in1d(a[:,0], underlap), 1], underlap*0]).T
fast_c = np.vstack([overlap, underlap])

This works by taking the intersection of the first column of a and b using intersect1d, and then using in1d to cross-reference that intersection with the second columns.

vstack stacks the elements of the input vertically, and the transpose is needed to get the right dimensions (very fast operation).

Then find times in a that are not in b using setdiff1d, and complete the result by putting 0s in the third column.

This prints out

array([[ 0.002,  0.998,  0.666],
       [ 0.004,  0.997,  0.665],
       [ 0.006,  0.996,  0.   ],
       [ 0.008,  0.995,  0.   ],
       [ 0.01 ,  0.993,  0.   ]])

score 0 · Answer 3 · answered Oct 25 '17 at 12:42

0

The following works both for numpy arrays and simple python lists.

c = [[*x, y[1]] for x in a for y in b if x[0] == y[0]]
d = [[*x, 0] for x in a if x[0] not in [y[0] for y in b]]
c.extend(d)

Someone braver than I am could try to make this one line.

answered Oct 25 '17 at 12:42

Iosif Serafeimidis

102
4

Compare two numpy arrays by first Column and create a third numpy array by concatenating two arrays

3 Answers3

Linked