For more than ~40 rows, these solutions (the numpy one for up to about 1000 rows, then the Pandas-based one) are the fastest so far.
Here is what I would do for a vectorized operation (fast, no Python loops):
import pandas as pd
def unique_pairs(a):
df = pd.DataFrame({'x': a.min(axis=1), 'y': a.max(axis=1)})
return a[~df.duplicated(keep=False)]
B = unique_pairs(A)
# on your example:
>>> B
array([[25, 73],
[99, 95]])
If you are looking for a pure numpy
solution (alas, as per the note below, it is slower for large arrays):
def np_unique_pairs(a):
z = np.stack((a.min(axis=1), a.max(axis=1)))
_, rix, cnt = np.unique(z, return_inverse=True, return_counts=True, axis=1)
return a[(cnt==1)[rix]]
Performance
A = np.random.randint(0, 10, (1_000_000, 2))
%timeit unique_pairs(A)
# 45.6 ms ± 49.8 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
Notes
np.unique(a, axis=0)
is quite a bit slower than Pandas family of duplicate functions (drop_duplicates()
, duplicated()
, etc.). See issue #11136.
- there are other ways that could work, such as mapping each pair of numbers on a row onto a single integer. See this SO answer answer for some ideas.
Speed comparison
Here is a comparison of the speed of 4 methods:
pd_unique_pairs()
is the Pandas solution proposed above.
np_unique_pairs()
is the pure Numpy solution proposed above.
counter_unique_pairs()
is proposed by @Acccumulation and is based on the use of Python loops and Counter
.
loop_unique_pairs()
is proposed by @ExploreX and is based on explicit Python loops.

Clearly, for more than 1000 rows, pd_unique_pairs
dominates. Between roughly 40 and 1000 rows, np_unique_pairs
wins. For very small arrays (under 40 rows), then counter_unique_pairs
is most effective.
# additional code for the perfplot above
import perfplot
def counter_unique_pairs(A):
A_Counter = Counter((tuple(sorted(item)) for item in A))
single_instances = [item for item in A_Counter if A_Counter[item]==1]
B = np.array([item for item in A if tuple(sorted(item)) in single_instances])
return B
def loop_unique_pairs(A):
B = []
for j,i in enumerate(A):
cond_i = (i == A) | (i[::-1] == A)
if sum(list((cond_i[:,0] & cond_i[:,1]))) == 1:
B.append((A[j]))
B = np.array(B)
return B
perfplot.show(
setup=lambda n: np.random.randint(0, np.sqrt(n).astype(int), (n, 2)),
kernels=[pd_unique_pairs, np_unique_pairs, counter_unique_pairs, loop_unique_pairs],
n_range=[2 ** k for k in range(3, 14)],
equality_check=None, # had to disable since loop_ appear to be wrong sometimes
)