Filter two numpy arrays based on pth percentile thresholds

Question

I have two numpy arrays of same shape (3, 3, 3, 64) - x & y. I then compute the pth percentile of the two arrays as their respective threshold which is then to be used to remove all elements less than the threshold. The code I have is as follows:

The aim is to remove all values in 'x' based on the condition: all values in 'x' < x_threshold AND all all values in 'y' < y_threshold:

# Create two np arrays of same dimension-
x = np.random.rand(3, 3, 3, 64)
y = np.random.rand(3, 3, 3, 64)

x.shape, y.shape
# ((3, 3, 3, 64), (3, 3, 3, 64))

x.min(), x.max()
# (0.0003979483351387314, 0.9995167558342761)

y.min(), y.max()
# (0.0006328536816179176, 0.9999504057216633)

# Compute 20th percentile as threshold for both np arrays-
x_threshold = np.percentile(x, 20)
y_threshold = np.percentile(y, 20)

print(f"x_threshold = {x_threshold:.4f} & y_threshold = {y_threshold:.4f}")
# x_threshold = 0.2256 & y_threshold = 0.1958

x[x < x_threshold].shape, y[y < y_threshold].shape
# ((346,), (346,))

# For 'x' try and remove all elements which for both of these conditions are true-
x[x < x_threshold and y < y_threshold]

The last line of code throws the error:

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

Solutions?

I came with the following code as a possible solution:

x_index = x < x_threshold
y_index = y < y_threshold

x_index.shape, y_index.shape
# ((3, 3, 3, 64), (3, 3, 3, 64))

x_index == y_index

# Find indices where the 2 conditions are NOT satisfied-
x[np.where(x_index != y_index)].shape
# (542,)

# Find indices where the 2 conditions are satisfied-
x[np.where(x_index == y_index)].shape
# (1186,)


# Create a copy of 'x'-
x_mod = x.copy()

# Set all values in 'x_mod' where the 2 conditions are satisfied-
x_mod[np.where(x_index == y_index)] = 0

# Sanity check-
np.count_nonzero(x_mod)
# 542

The count of 542 matches and thereby shows that the code works. Of course, the solution presented by user "bb1" below is more elegant and just a one-liner!

score 1 · Accepted Answer · answered Apr 11 '21 at 17:02

Use instead

x[(x < x_threshold) & (y < y_threshold)]

The condition (x < x_threshold) & (y < y_threshold) gives an elementwise logical and of boolean arrays.

By contrast, (x < x_threshold) and (y < y_threshold) attempts to compute a True or False value of the whole expression and fails since this value is undefined.

Filter two numpy arrays based on pth percentile thresholds

1 Answers1