Suppose I have two unordered lists of equal length in Python:
a = [5, 2, 3, 1, 4]
b = ['d', 'b', 'a', 'c', 'e']
Is there an O(n), in-place algorithm to obtain the following result?
[(1, 'a'), (2, 'b'), (3, 'c'), (4, 'd'), (5, 'e')]
Suppose I have two unordered lists of equal length in Python:
a = [5, 2, 3, 1, 4]
b = ['d', 'b', 'a', 'c', 'e']
Is there an O(n), in-place algorithm to obtain the following result?
[(1, 'a'), (2, 'b'), (3, 'c'), (4, 'd'), (5, 'e')]
You're looking for the zip
and sorted
built-in functions.
r = zip(sorted(a), sorted(b))
zip
takes two iterables and pairs them together in sequence (so if the lists were unsorted, you'd get (5, 'd')
as your first tuple), and any excess values appear to be truncated/ignored (since they can't be paired).
sorted
, the last time I looked into the code base, uses a different sorting algorithm depending on the size of the lists you give it - it should perform at about O(n*log(n)). There isn't a practical sort out there that gives you O(n) performance, since you have to still compare an individual value with the rest of the values in some amount of time.
If you want an in-place sort, you can use the list.sort()
function, which does perform an in-place sort. This changes the syntax to the following:
a.sort()
b.sort()
r = zip(a, b)
i don't think there is.
sort()
is considered to take O(nlogn)
and your requirement is something more than sort(though only a little bit). If there's some kind of O(n)
algorithm for this, we can also use it to replace sort()
which has been studied for long and is not likely to have an O(n)
algorithm.
zip
will give you a constant time (but not in place) pairing of elements. izip
from itertools
has a constant memory footprint, but you'd need to do linear time scans each time you access an element out of order, and then reset your generator.
If you can afford an O(n log(n)) in place sorting algorithm, there's a great question and answer about the default implementation of sort()
here.
I think the best approach for most applications where the lists are large enough for memory and computation time to matter would be to call sort
on each array, and then use the itertools.izip
method to create a generator on the results. This approach has constant memory overhead, and is as good as you can get for asymptotic computation time on a generic array.
Constant time sorting can be done with radix sort , or some variation, however this is not in place and makes some assumptions about your datatypes (ie, array of ints or chars works, but floats and BigInts get messy)
Side bar: the bucket sort article on wikipedia needs some attention if anyone in this community has some free time.
Yes there is a way to get O(N) when sorting positive integers less than or equal to N. The way to do it is to use buckets. Here is an implementation:
def _sort(_list):
buckets=[0]*len(_list)
for i in _list:
i=int(i)
assert(0<=i<len(_list))
buckets[i]+=1
result=[]
for num,count in enumerate(buckets):
result.extend([num]*count)
return result
alp=map(ord,list("dabce"))
m=min(alp)
alp=[i-m for i in alp]
alp=_sort(alp)
alp=[i+m for i in alp]
alp=map(chr,alp)
print zip(_sort([1,3,2,0,4]),alp)
#[(0, 'a'), (1, 'b'), (2, 'c'), (3, 'd'), (4, 'e')]