0

I have three nested arrays, containing around 10,000 elements (each array has a different number of elements). These arrays are ordered with respect to the 0th element and the 1st element in each inner list has no real pattern.

So for example,

a = np.array([[1,13],[2,36],[5,63],[10,35],[11,2]...])
b = np.array([[1,13],[3,32],[7,55],[10,30],[13,21],[15,10]...])
c = np.array([[2,10],[4,36],[5,58],[8,5]...])

What i need to do is combine the arrays and then sort them with respect to the 0th element. I know of a simple method using

D = np.concatenate((a,b,c))

to combine them, and then using,

D_sort =sorted(D, key = itemgetter(0))

to sort them w.r.t the 0th element. However, this is very time consuming, and I have been wondering if there is a solution using the fact that the 0th element in each array a,b and c are sorted.

So to reiterate, is there an efficient method of combining three nested arrays and sort them w.r.t the 0th element given that the 0th element in each individual array is already sorted?

For the example given, the output would be,

[([ 1, 13], [ 1, 13],[ 2, 36],[ 2, 10],[ 3, 32],[ 4, 36],[ 5, 63],[ 5, 58],[ 7, 55],[8, 5],[10, 35],[10, 30],[11,  2],[13, 21],[15, 10])]
Holtz
  • 323
  • 1
  • 6
  • 14
  • possible duplicate of [Combining two sorted lists in Python](http://stackoverflow.com/questions/464342/combining-two-sorted-lists-in-python) – Robᵩ Sep 26 '13 at 15:16

2 Answers2

0

Have a look at heapq.merge - doesn't it do what you need?

volferine
  • 372
  • 1
  • 9
  • [Doc](http://docs.python.org/2/library/heapq.html) says, "assumes that each of the input streams is already sorted". OP's input is not sorted according to `merge`'s key. – Robᵩ Sep 26 '13 at 15:34
0

However, this is very time consuming

I suppose that depends upon your point of view. Consider these times:

In [84]: a,b,c=(sorted([random.randint(1,1000000),random.randint(1,1000000)] for _ in range(random.randint(9000,11000))) for _ in range(3))

In [85]: %timeit sorted(a+b+c)
100 loops, best of 3: 7.38 ms per loop

In [86]: %timeit heapq.merge(sorted(a),sorted(b),sorted(c))
100 loops, best of 3: 2.53 ms per loop

In [87]: %timeit heapq.merge(a,b,c)
1000000 loops, best of 3: 427 ns per loop

Note: I could only call heapq.merge because my inputs were fully sorted.

Obviously heapq.merge is quicker (by 104), but if your input isn't fully sorted, then it simply isn't an option. heapq is, I believe, pure python, so you could possibly re-implement heapq.merge with a key= parameter.

Robᵩ
  • 163,533
  • 20
  • 239
  • 308