Find the set of elements in a list without sorting in Python

Question

I have a list of elements 1,...,K with repeats. For example for K=4 :

[4 2 1 1 2 1 1 3 2  ]

I want to find the sequence that 1,...,K is appeared in the list (without sorting). For example for the above sequence, the result would be

[4, 2 ,1 ,3 ]

How can I write this algorithm efficiently in python, with less runtime.

Thank you!

I don't think that sorting would help you anyway (since you seem to want the order in which the elements appear), so I'm not sure what the point of that stipulation is. You need to clarify what you are trying to do. — John Coleman, Jul 27 '16 at 19:33

score 3 · Answer 1 · answered Jul 27 '16 at 19:36

3

from collections import OrderedDict
list_numbers=[4,2,1,1,2,1,1,3,2]
print list(OrderedDict.fromkeys(list_numbers))

This gives the desired output - [4, 2, 1, 3]

answered Jul 27 '16 at 19:36

aramase

114
3

score 2 · Answer 2 · edited May 23 '17 at 11:44

The normal list-deduping would probably be good enough:

def f7(seq):
    seen = set()
    seen_add = seen.add
    return [x for x in seq if not (x in seen or seen_add(x))]

reference

However, this is by nature O(N). That's the best you can do in the general case, but you may be able to do better from a practical standpoint for a large class of inputs.

def ordered_dedupe_with_constraints(lst, K):
    output = collections.OrderedDict()
    len_lst = len(lst)
    i = 0
    while len(output) < K and i < len_lst:
        output.setdefault(lst[i], None) 
        i += 1
    return list(output)

This second answer uses the fact that you have at most K distinct elements in lst to break early when the k'th element has been added to the output. Although this is still O(N) in the general case, it's possible that you'll get MUCH better performance from this is K << len_lst and the items are sufficiently shuffled. Of course, you need to know K ahead of time by some means other than iterating to get the max (that would defeat the purpose of our short-circuiting).

If these constraints aren't the case, you're probably better off going with the function f7 as reported in the reference since the implementation there is likely to be more optimal than the implementation here.

score 0 · Accepted Answer · answered Jul 27 '16 at 19:58

Here is another way, which assumes that all numbers in the range 1,...,k appear (as per the problem description):

def inOrder(nums):
    k = max(nums)
    indices = [nums.index(n) for n in range(1,k+1)]
    return [n for i,n in sorted(zip(indices,range(1,k+1)))]

For example

>>> inOrder([4, 2, 1, 1, 2, 1, 1, 3, 2])
[4, 2, 1, 3]

It is O(nk) where n is the length of the list. On the other hand, it uses built-in methods which are fairly quick, and if on average the first appearance of each number is somewhat early in the list, then the runtime will be much better than the worst-case. For example, if you define:

nums = [random.randint(1,1000) for i in range(10**6)]

then the evaluation of inOrder(nums) takes less than a second (even though the list has 1 million entries).

score 0 · Answer 4 · answered Jul 28 '16 at 08:05

This will be O(k).

It will go through the list. For each element, if it's the first time that element appears, it will add it to the list.

If there's a possibility that there are numbers in the list larger than k, or other non integer elements, add an extra check that it's an integer less than k. This code will not ensure that all of the numbers between 0 and k exist in the list.

def findSeq(inputList):
    dict = {}
    newList = []
    for elem in inputList:
        if elem not in dict:
            dict[elem] = True # This can be set to anything
            newList += [elem]
    return inputList

I wrote this first because I misunderstood your question... Didn't want it to go to waste :). This checks if the elements of a list appear in another list in order.

# inList([2, 1, 5], [2, 3, 1, 5]) -> True  
#inList([2, 1, 5], [2, 3, 5, 1]) -> False

def inList(small, big):
    i = 0         # index in small
    j = 0         # index in big
    while j < len(big):
        if small(i) == big(j):
            i += 1
            j += 1
            # Any success is guaranteed to happen here,
            # right after you've found a match
            if i+1 == len(small):
                return True
        else:
            j += 1
    return False

Find the set of elements in a list without sorting in Python

4 Answers4