1

Suppose I have two unordered lists of equal length in Python:

a = [5, 2, 3, 1, 4]
b = ['d', 'b', 'a', 'c', 'e']

Is there an O(n), in-place algorithm to obtain the following result?

[(1, 'a'), (2, 'b'), (3, 'c'), (4, 'd'), (5, 'e')]
Thanatos
  • 42,585
  • 14
  • 91
  • 146
dangerChihuahua007
  • 20,299
  • 35
  • 117
  • 206
  • 7
    `Is there an O(n), in-place algorithm to obtain the following result?` No, the lower boundary for sorting algorithm is provably `O(n*log(n))` and an `O(n)` solution to your problem would contradict with that. – Lie Ryan Jan 08 '12 at 07:12
  • 1
    "In place" means what here, exactly? You're starting with two lists, and ending with one: is the answer supposed to end up in `a` or `b` with no intermediate memory allocation? (At least, as best you can control in Python?) – Thanatos Jan 08 '12 at 07:31
  • Yes it is possible since there is no sparsity (gaps between the integers). Assuming you know you have max(a)-min(a)<=N, then an O(N) solution is possible! see my implementation below. – Rusty Rob Jan 08 '12 at 08:24

4 Answers4

6

You're looking for the zip and sorted built-in functions.

r = zip(sorted(a), sorted(b))

zip takes two iterables and pairs them together in sequence (so if the lists were unsorted, you'd get (5, 'd') as your first tuple), and any excess values appear to be truncated/ignored (since they can't be paired).

sorted, the last time I looked into the code base, uses a different sorting algorithm depending on the size of the lists you give it - it should perform at about O(n*log(n)). There isn't a practical sort out there that gives you O(n) performance, since you have to still compare an individual value with the rest of the values in some amount of time.

If you want an in-place sort, you can use the list.sort() function, which does perform an in-place sort. This changes the syntax to the following:

a.sort()
b.sort()
r = zip(a, b)
Makoto
  • 104,088
  • 27
  • 192
  • 230
  • Small nitpick: the poster asked for an in-place algorithm, and `sorted` creates a new copy of each list. To sort the lists in-place, call `a.sort()` and `b.sort()` in the lines before the call to `zip`. – HardlyKnowEm Jan 08 '12 at 07:14
  • Mm, seems you're right. I'll adjust that in my answer. Thanks for the heads-up. @mlefavor: Yeah, that ain't happenin'. – Makoto Jan 08 '12 at 07:16
  • 1
    ain't ```sort()``` takes ```O(nlogn)```? – iloahz Jan 08 '12 at 07:43
  • I don't think this works... Sorting probably runs in O(n log n), not O(n) – templatetypedef Jan 08 '12 at 07:48
  • To address concerns about the timing of the sort: yes, I was explicit in stating that those sorts weren't O(n). There really isn't any sort that I can think of (except maybe sleep sort, but no one has the time to wait for a large number to wake up again). – Makoto Jan 08 '12 at 08:00
  • This isn't correct., +1 as sorting is usually order(n*log(n)), HOWEVER, that is assuming there is sparsity in the items you are sorting. Otherwise an order(N) solution does exist. Please see my solution – Rusty Rob Jan 08 '12 at 08:22
2

i don't think there is.

sort() is considered to take O(nlogn) and your requirement is something more than sort(though only a little bit). If there's some kind of O(n) algorithm for this, we can also use it to replace sort() which has been studied for long and is not likely to have an O(n) algorithm.

iloahz
  • 4,491
  • 8
  • 23
  • 31
  • Hi, assuming sorting takes time O(f(n)), because he needs to sort twice and then add these results together, the time taken is O(f(n))+O(f(n))+O(n)=sort+sort+join together. Note we can add these together to get O(2*f(n))+O(n)=O(f(n))+O(f(n)), now if f(n)>n this reduces down to O(f(n)). – Rusty Rob Jan 08 '12 at 08:34
2

zip will give you a constant time (but not in place) pairing of elements. izip from itertools has a constant memory footprint, but you'd need to do linear time scans each time you access an element out of order, and then reset your generator.

If you can afford an O(n log(n)) in place sorting algorithm, there's a great question and answer about the default implementation of sort() here.

I think the best approach for most applications where the lists are large enough for memory and computation time to matter would be to call sort on each array, and then use the itertools.izip method to create a generator on the results. This approach has constant memory overhead, and is as good as you can get for asymptotic computation time on a generic array.

Constant time sorting can be done with radix sort , or some variation, however this is not in place and makes some assumptions about your datatypes (ie, array of ints or chars works, but floats and BigInts get messy)

Side bar: the bucket sort article on wikipedia needs some attention if anyone in this community has some free time.

Community
  • 1
  • 1
stein
  • 180
  • 7
1

Yes there is a way to get O(N) when sorting positive integers less than or equal to N. The way to do it is to use buckets. Here is an implementation:

def _sort(_list):
    buckets=[0]*len(_list)
    for i in _list:
        i=int(i)
        assert(0<=i<len(_list))
        buckets[i]+=1
    result=[]
    for num,count in enumerate(buckets):
        result.extend([num]*count)
    return result



alp=map(ord,list("dabce"))
m=min(alp)
alp=[i-m for i in alp]
alp=_sort(alp)
alp=[i+m for i in alp]
alp=map(chr,alp)

print zip(_sort([1,3,2,0,4]),alp)
#[(0, 'a'), (1, 'b'), (2, 'c'), (3, 'd'), (4, 'e')]
Rusty Rob
  • 16,489
  • 8
  • 100
  • 116
  • You're making sweeping assumptions about your data. First, this assumes a list of only integers (which is an okay assumption). Second, it assumes that the range of integers you have is within the bounds of your list, which may not always be the case. Third, depending on the number of duplicates *k* in the list, for every element *n*, you would be performing on O(n+k) on average. – Makoto Jan 08 '12 at 18:39
  • I did assume all the integers were between 1 and n. I realize this is a sweeping assumption in general. However he only asked for his particular example which does fit the assumption and I did state at the head of my answer the restrictions on when you can achieve O(n). Sometimes you can make the sweeping assumption if e.g. if you have some upper bound e.g. the maximum auto increment for example. – Rusty Rob Jan 08 '12 at 20:27
  • there are naturally occurring phenomena where one should make these assumptions, albeit these phenomena rarely occur, I still hope you found my solution "interesting" :) – Rusty Rob Jan 08 '12 at 20:31