1

In Decision Trees, we can improve the speed of finding a good split by first sorting the training samples by a certain feature column. However, since all the features are numbers, it's suggested that Radix sort might actually be the fastest. I can't figure out how to use Radix sort to sort a Numpy array by a column though.

From here, we can sort a numpy array by column (for example, column 1) using mergesort, quicksort, or heapsort:

a[a[:,1].argsort()]

Here's an implementation of radix sort in Python that works well.

How do you combine the two? I'd love to get something of the following behavior:

a[a[:, 1].argsort(kind="radix")]

Is this possible?

Community
  • 1
  • 1
Nick
  • 1,864
  • 5
  • 27
  • 49
  • No. At least not without digging deep into both that `radixsort` code, and the `argsort` code. Have you tested that `radix` sort on your data, ie on `a[:,1]`? – hpaulj Mar 22 '17 at 00:12

0 Answers0