I have an array with strings of size of 2 and want to get unique strings in each row.
np.__version__
# '1.19.2'
arr = np.array([['Z7', 'Q4', 'Q4'], # 2 unique strings
['Q4', 'Z7', 'Q4'], # 2 unq strings
['Q4', 'Z7', 'Z7'], # 2 unq strings
['Z7', 'Z7', 'Q4'], # 2 unq strings
['D8', 'D8', 'L1'], # 2 unq strings
['L1', 'L1', 'D8']], dtype='<U2') # 2 unq strings
It is guaranteed that every row contains the same number of uniques strings i.e. every row will have the same number of unique strings in my case it's 2.
Expected output:
array([['Q4', 'Z7'],
['Q4', 'Z7'],
['Q4', 'Z7'],
['Q4', 'Z7'],
['D8', 'L1'],
['D8', 'L1']], dtype='<U2')
Here, each row is sorted but it's doesn't have to be. It's fine both ways.
My code:
np.apply_along_axis(np.unique, 1, arr)
# array([['Q4', 'Z7'],
# ['Q4', 'Z7'],
# ['Q4', 'Z7'],
# ['Q4', 'Z7'],
# ['D8', 'L1'],
# ['D8', 'L1']], dtype='<U2')
I thought np.unique
over axis 1 would give expected results but
np.unique(arr, axis=1)
# array([['Q4', 'Q4', 'Z7'],
# ['Q4', 'Z7', 'Q4'],
# ['Z7', 'Z7', 'Q4'],
# ['Q4', 'Z7', 'Z7'],
# ['L1', 'D8', 'D8'],
# ['D8', 'L1', 'L1']], dtype='<U2')
I couldn't understand what exactly happened and why it returned this exact output.