45

How can I use numpy unique without sorting the result but just in the order they appear in the sequence? Something like this?

a = [4,2,1,3,1,2,3,4]

np.unique(a) = [4,2,1,3]

rather than

np.unique(a) = [1,2,3,4]

Use naive solution should be fine to write a simple function. But as I need to do this multiple times, are there any fast and neat way to do this?

kuantkid
  • 569
  • 1
  • 4
  • 5

2 Answers2

69

You can do this with the return_index parameter:

>>> import numpy as np
>>> a = [4,2,1,3,1,2,3,4]
>>> np.unique(a)
array([1, 2, 3, 4])
>>> indexes = np.unique(a, return_index=True)[1]
>>> [a[index] for index in sorted(indexes)]
[4, 2, 1, 3]
del
  • 6,341
  • 10
  • 42
  • 45
  • Always helpful to link the docs: [numpy.unique](https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.unique.html) – BoltzmannBrain Nov 15 '17 at 16:02
  • 2
    Yes this gets the unique indices, but is the sorting necessary? The iterations req'd to sort are the same as just searching through the array for unique items, so the time complexity can't be avoided. But numpy.unique returns a new array object. We should be able to avoid this space complexity. – BoltzmannBrain Nov 15 '17 at 16:09
  • 5
    Works but this really should be built-in as an option to np.unique. – NoName Feb 04 '20 at 12:58
  • 3
    From [this answer](https://stackoverflow.com/a/15637512/900394) just use `pandas.unique()`. It doesn't sort by default. – Alaa M. May 18 '21 at 19:18
  • 1
    FYI: [this answer](https://stackoverflow.com/a/15637512) (linked above) even presents a shorter solution using np.unique (and calling it only once): `a[np.sort(np.unique(a, return_index=True)[1])]` – BottleNick Nov 10 '21 at 09:21
  • @AlaaM. pandas is a very large module; adding such a `requirement` for installation of your package is not suitable if you want to keep it simple – ivan866 Jan 06 '23 at 08:24
6

You could do this using numpy by doing something like this, the mergsort is stable so it'll let you pick out the first or last occurrence of each value:

def unique(array, orderby='first'):
    array = np.asarray(array)
    order = array.argsort(kind='mergesort')
    array = array[order]
    diff = array[1:] != array[:-1]
    if orderby == 'first':
        diff = np.concatenate([[True], diff])
    elif orderby == 'last':
        diff = np.concatenate([diff, [True]])
    else:
        raise ValueError
    uniq = array[diff]
    index = order[diff]
    return uniq[index.argsort()]

This answer is very similar to:

def unique(array):
    uniq, index = np.unique(array, return_index=True)
    return uniq[index.argsort()]

But, numpy.unique uses an unstable sort internally so you're not guaranteed to get any specific index, ie first or last.

I think an ordered dict might also work:

def unique(array):
    uniq = OrderedDict()
    for i in array:
         uniq[i] = 1
    return uniq.keys()
Bi Rico
  • 25,283
  • 3
  • 52
  • 75
  • Thanks for your quick reply. I have thought about the first one, but I am not sure whether it is the fastest. The second one should suffer from putting a numpy object into a python object implicitly :) – kuantkid Oct 17 '12 at 04:21
  • Is the problem with the second `unique` using `np.unique`'s `return_index` argument that it might produce incorrect results? That this `unique` might return a sequence with some elements not respecting the order imposed by the original sequence, e.g., (purely for demonstration) `unique([1,0,1]) --> [0, 1]`? – Ahmed Fasih Oct 06 '13 at 12:29
  • `np.unique`'s documentation (http://docs.scipy.org/doc/numpy/reference/generated/numpy.unique.html) states that the indexes returned with `return_index=True` will indicate the *first* occurrences, so your second `unique` should be safe and correct, right? – Ahmed Fasih Oct 06 '13 at 12:36