In Decision Trees, we can improve the speed of finding a good split by first sorting the training samples by a certain feature column. However, since all the features are numbers, it's suggested that Radix sort might actually be the fastest. I can't figure out how to use Radix sort to sort a Numpy array by a column though.
From here, we can sort a numpy array by column (for example, column 1) using mergesort, quicksort, or heapsort:
a[a[:,1].argsort()]
Here's an implementation of radix sort in Python that works well.
How do you combine the two? I'd love to get something of the following behavior:
a[a[:, 1].argsort(kind="radix")]
Is this possible?