1

I have two 2d arrays that contain XYZ points, A and B.
Array A has the shape (796704, 3) and is my original pointcloud. Each point is unique except for (0, 0, 0) but those don't matter:

A = [[x_1, y_1, z_1],
     [x_2, y_2, z_2],
     [x_3, y_3, z_3],
     [x_4, y_4, z_4],
     [x_5, y_5, z_5],
     ...]

Array B has the shape (N, 4) and is a cropped version of A (N<796704).
The remaining points did not change and are still equal to their counterpart in A.
The fourth column contains the segmentation value of each point.
The row order of B is completely random and doesn't match A anymore.

B = [[x_4, y_4, z_4, 5],
     [x_2, y_2, z_2, 12],
     [x_6, y_6, z_6, 5],
     [x_7, y_7, z_7, 3],
     [x_9, y_9, z_9, 3]]

I need to reorder the rows of B so that they match the rows of A with the same point and fill in the gaps with a zero row:

B = [[0.0, 0.0, 0.0, 0],
     [x_2, y_2, z_2, 12],         
     [0.0, 0.0, 0.0, 0],
     [x_4, y_4, z_4, 5],
     [0.0, 0.0, 0.0, 0],         
     [x_6, y_6, z_6, 5],
     [x_7, y_7, z_7, 3],
     [0.0, 0.0, 0.0, 0],
     [x_9, y_9, z_9, 3],
     [0.0, 0.0, 0.0, 0],
     [0.0, 0.0, 0.0, 0],
     [0.0, 0.0, 0.0, 0]
     ...]

In the end B should have the shape (796704, 4).

I tried using the numpy_indexed package like it was proposed in this very similar question but the issue here is that B doesn't contain all the points of A:

import numpy_indexed as npi
B[npi.indices(B[:, :-1], A)]

I'm not familiar with numpy and my only solution would be a for-loop but that would be far to slow for my application. Is there some sort of fast method of solving this problem?

Levaru
  • 51
  • 4

3 Answers3

1

Pandas => reindex:

import pandas as pd
import numpy as np

A = np.array([[8, 7, 4],
              [0, 7, 7],
              [4, 7, 0],
              [5, 5, 8],
              [8, 7, 5]])

B = np.array([[8, 7, 4, 2],
           [4, 7, 0, 5],
           [8, 7, 5, 6]])

df_B = (pd.DataFrame(B, columns=["x", "y", "z", "seg"])
            .set_index(["x", "y", "z"])
            .reindex(list(map(tuple, A)))
            .reset_index())
df_B.loc[df_B.seg.isna()] = 0
B = df_B.values

print(B)

Result:

array([[8., 7., 4., 2.],
       [0., 0., 0., 0.],
       [4., 7., 0., 5.],
       [0., 0., 0., 0.],
       [8., 7., 5., 6.]])
mcsoini
  • 6,280
  • 2
  • 15
  • 38
  • I just tried your solution and the execution takes a very long time, I had to abort after 13 minutes. The shape of A was (796704, 3) and the shape of B was (116987, 4). My points are also float values not integers in case it's important. – Levaru Mar 14 '22 at 12:10
0

Solving your problem just with numpy:

Case 1

You're working just with numbers:

import numpy as np
A = np.array([[1, 1, 1],
              [2, 2, 2],
              [3, 3, 3],
              [4, 4, 4],
              [5, 5, 5],
              [6, 6, 6],
              [7, 7, 7],
              [8, 8, 8],
              [9, 9, 9],
              [10,10, 10]
              ])
B = np.array([[4, 4, 4, 5],
              [2, 2, 2, 12],
              [6, 6, 6, 5],
              [7, 7, 7, 3],
              [9, 9, 9, 3]])

c = np.insert(A, 3, 0, axis = 1)
d = np.vstack((B,c[np.in1d(c[:,0],B[:,0], invert=True)]*0))
print(d)

Out:
[[ 4  4  4  5]
 [ 2  2  2 12]
 [ 6  6  6  5]
 [ 7  7  7  3]
 [ 9  9  9  3]
 [ 0  0  0  0]  # previously  1,  1,  1, 0
 [ 0  0  0  0]  # previously  3,  3,  3, 0
 [ 0  0  0  0]  # previously  5,  5,  5, 0
 [ 0  0  0  0]  # previously  8,  8,  8, 0
 [ 0  0  0  0]] # previously 10, 10, 10, 0

Explanation:

c will be a copy of A with a new field with a 0:

c = np.insert(A, 3, 0, axis = 1)

If I print c right now I will get this:

[[ 1  1  1  0]
 [ 2  2  2  0]
 [ 3  3  3  0]
 [ 4  4  4  0]
 [ 5  5  5  0]
 [ 6  6  6  0]
 [ 7  7  7  0]
 [ 8  8  8  0]
 [ 9  9  9  0]
 [10 10 10  0]]

2º You create a new array with B, and the parts of c that are not in B multiplied by 0.

d = np.vstack((B,c[np.in1d(c[:,0],B[:,0], invert=True)]*0))

2.1 np.vstack((B,_)) Here I removed the c just to be more easy to you to see the args that vstack receive. You have a tuple with the two arrays that you want to concatenate.

2.2 c[np.in1d(c[:,0],B[:,0], invert=True)]*0 Instead of passing all the c a pass c selecting np.in1d(c[:,0],B[:,0], invert=True) of c and multiplying it by 0.

2.3 np.in1d(c[:,0],B[:,0], invert=True) If I do np.in1d(c[:,0],B[:,0]) I get a boolean array telling me which x_n of c also exists in B, if I set invert=True i'll get which x_n of c does NOT exists in B. (Another way to to that invertion is by using the tilde operator ~, so ~np.in1d(c[:,0],B[:,0]) == np.in1d(c[:,0],B[:,0], invert=True))

Since each point is unique with the exception of the 0,0,0,0 ones, when I do c[np.in1d(c[:,0],B[:,0], invert=True)] I get:

array([[ 1,  1,  1,  0],
       [ 3,  3,  3,  0],
       [ 5,  5,  5,  0],
       [ 8,  8,  8,  0],
       [10, 10, 10,  0]])

if I multiply by 0 I get:

array([[0, 0, 0, 0],
       [0, 0, 0, 0],
       [0, 0, 0, 0],
       [0, 0, 0, 0],
       [0, 0, 0, 0]])

So in np.vstack((B,c[np.in1d(c[:,0],B[:,0], invert=True)]*0)) I concatenate the B and the c. Being the B this:

array([[ 4,  4,  4,  5],
       [ 2,  2,  2, 12],
       [ 6,  6,  6,  5],
       [ 7,  7,  7,  3],
       [ 9,  9,  9,  3]])

and c the array of 0's above. The result at the end is:

array([[ 4,  4,  4,  5],
       [ 2,  2,  2, 12],
       [ 6,  6,  6,  5],
       [ 7,  7,  7,  3],
       [ 9,  9,  9,  3],
       [ 0,  0,  0,  0],
       [ 0,  0,  0,  0],
       [ 0,  0,  0,  0],
       [ 0,  0,  0,  0],
       [ 0,  0,  0,  0]])

Case 2

If you are working with strings and numbers you can do that way:

import numpy as np
A = np.array([['x_1', 'y_1', 'z_1'],
     ['x_2', 'y_2', 'z_2'],
     ['x_3', 'y_3', 'z_3'],
     ['x_4', 'y_4', 'z_4'],
     ['x_5', 'y_5', 'z_5'],
     ['x_6', 'y_6', 'z_6'],
     ['x_7', 'y_7', 'z_7'],
     ['x_8', 'y_8', 'z_8'],
     ['x_9', 'y_9', 'z_9'],
     ['x_10', 'y_10', 'z_10']
     ])
B = np.array([['x_4', 'y_4', 'z_4', 5],
     ['x_2', 'y_2', 'z_2', 12],
     ['x_6', 'y_6', 'z_6', 5],
     ['x_7', 'y_7', 'z_7', 3],
     ['x_9', 'y_9', 'z_9', 3]])

c = np.insert(A, 3, 0, axis = 1)
c[np.in1d(c[:,0],B[:,0], invert=True)] = 0

d = np.vstack((B,c[np.in1d(c[:,0],B[:,0], invert=True)]))
print(d)

Out: 
[['x_4' 'y_4' 'z_4' '5']
 ['x_2' 'y_2' 'z_2' '12']
 ['x_6' 'y_6' 'z_6' '5']
 ['x_7' 'y_7' 'z_7' '3']
 ['x_9' 'y_9' 'z_9' '3']
 ['0' '0' '0' '0']
 ['0' '0' '0' '0']
 ['0' '0' '0' '0']
 ['0' '0' '0' '0']
 ['0' '0' '0' '0']]

Explanation:

c will be a copy of A with a new field with a 0:

c = np.insert(A, 3, 0, axis = 1)

If I print c right now I will get this:

[['x_1' 'y_1' 'z_1' '0']
 ['x_2' 'y_2' 'z_2' '0']
 ['x_3' 'y_3' 'z_3' '0']
 ['x_4' 'y_4' 'z_4' '0']
 ['x_5' 'y_5' 'z_5' '0']
 ['x_6' 'y_6' 'z_6' '0']
 ['x_7' 'y_7' 'z_7' '0']
 ['x_8' 'y_8' 'z_8' '0']
 ['x_9' 'y_9' 'z_9' '0']
 ['x_10' 'y_10' 'z_10' '0']]

2º At the fields of c that don't exist in B, i'll set as 0

c[np.in1d(c[:,0],B[:,0], invert=True)] = 0

d will be B + the c part that was set as 0

d = np.vstack((B,c[np.in1d(c[:,0],B[:,0], invert=True)]))

Since in this case you're working with strings and numbers in the same array you can't just multiply all by 0 at the d. So you need to set the fields of c as 0 and then select the 0's fields.

Useful links:

Code that I based my answer in.

Tilde Operator.

Luigi Minardi
  • 343
  • 4
  • 13
  • I tried your solution but instead of filling the gaps, the zero rows are appended to the bottom. I need the rows of B to be at the exact same position as in A with the missing rows to be set to zero. In your example there should three zero rows before [ 4 4 4 5] in d. – Levaru Mar 14 '22 at 12:25
  • 1
    @Levaru Yeah, it goes to the bottom since is a concatenation of one array of 0's and the B array. I searched here and numpy are not able to retrieve indexes, you do can get them by some ways but neither are good enought to replace the `numpy_indexed` that you used in your solution, nor they work with 2D arrays. Good that you found a way to do what you wanted :) – Luigi Minardi Mar 14 '22 at 21:36
0

I managed to solve this problem by using the numpy_indexed package, which I mentioned in my question.

The solution:

A = np.array([[8, 7, 4],
              [0, 7, 7],
              [4, 3, 0],                  
              [5, 5, 8],                  
              [3, 9, 5]])

B = np.array([[3, 9, 5, 6],
              [8, 7, 4, 2],
              [4, 3, 0, 5]])

# Create a new, zero-filled, array C with length of A
C = np.zeros((A.shape[0], 4))

# Insert B at the beginning of C
C[:B.shape[0], :B.shape[1]] = B

print(C)

Out:
[[3, 9, 5, 6],
 [8, 7, 4, 2],
 [4, 3, 0, 5],                  
 [0, 0, 0, 0],                  
 [0, 0, 0, 0]]


# Using the numpy_indexed package reorder the rows.
# The last index of C is used as a fill value in case
# a row wasn't found in A thus filling the gaps with [0,0,0,0]    
import numpy_indexed as npi
D = C[npi.indices(C[:, :-1], A, missing=-1)]

print(D)

Out:
[[8, 7, 4, 2],
 [0, 0, 0, 0],
 [4, 3, 0, 5],                  
 [0, 0, 0, 0],                  
 [3, 9, 5, 6]]
Levaru
  • 51
  • 4