2

I've always wondered why the C++ standard template library doesn't seem to have standard bucket/library (distribution) sorts. These seem underused in modern programming, apparently due to the requirement a way to convert an object to an integer to sort by. Both seem relatively simple to me, so why don't we have this in the library?

template<class RandomAccessIterator, class Index, class index_type=unsigned int>
void std::distribution_sort(
        RandomAccessIterator begin,
        RandomAccessIterator end
        index_type minval,
        index_type maxval,
        Index indexer,);

unsigned int indexer(const std::string& word) 
{
    switch(word.size()) {
    case 0: return 0;
    case 1: return (word[0]<<24);
    case 2: return (word[0]<<24) | (word[1]<<16);
    case 3: return (word[0]<<24) | (word[1]<<16) | (word[2]<<24);
    default: return (word[0]<<24) | (word[1]<<16) | (word[2]<<8) | (word[3]);
    }
}

int main() {
    std::vector<std::string> data;
    data.push_back("");
    data.push_back("APPLES");
    data.push_back("banana");
    std::distribution_sort(data.begin(), data.end(), 0, ~0, indexer);
} 
Mooing Duck
  • 64,318
  • 19
  • 100
  • 158

2 Answers2

2

Both seem relatively simple to me, so why don't we have this in the library?

Lots of things are simple. That's not a good reason to have them in a library.

I suppose the reason would be that std::sort is good enough for most cases.

Peter Alexander
  • 53,344
  • 14
  • 119
  • 168
  • I wrote a distribution_sort proof of concept and tested in MSVC10. For ints, it's runs as low as half the time, but only if the number of ints is 1000-1000000. It runs in 30%-50% of the time for objects that require dynamic memory though. However, it doesn't beat std::sort by the margins I was expecting. – Mooing Duck Aug 23 '11 at 17:07
1

For the same reason there isn't ASCII-to-EBCDIC conversions, database connectivity, natural language analysis, text-to-speech synthesis and a whole raft of other features.

Every decision has an opportunity cost (meaning all the things foregone, by doing something else) and the standards are a contract between programmers and implementers.

I would love to be able to write the program:

int main (void) {
    std::accountingApplication();
    return 0;
}

instead of having to actually code up an accounting application but I fear the implementers may baulk at providing that level of power.

In addition, C and C++ both have a perfectly good sort function for the majority of cases. If it turns out it's not up to the particular data somebody has, they're expected to write their own.

If the standard added a bucket sort, why stop there? Why not give us a separate sort for all sorts of data distributions (even the much maligned bubble sort works well on small sets or sets that are already mostly sorted)?

Because that reasoning would give us the next C++ standard in 2166 rather than around 2025, that's why :-)

See also this related answer for a more detailed explanation of these points.


As an aside, I'm not sure that the requirement to convert your object into a distributable integer is a problem - this could be easily solved with callbacks (like the comparison function in qsort) or standard methods (like Java's toString).

Community
  • 1
  • 1
paxdiablo
  • 854,327
  • 234
  • 1,573
  • 1,953
  • Actually, I think some implementations do have red-black trees, under std::map and such. However, considering C++0x already has FOUR comparison sorts (and who uses partial_sort_copy?) I didn't think a simple distribution sort was out of line, especially since it's very common to sort data that has unique indecies in a small range. Yes, we can do it ourselves, but instead, everyone uses the quicksort. O(Nlog(N)) instead of the O(2N) – Mooing Duck Aug 18 '11 at 23:56
  • @Mooing, I've changed red-black to database stuff - I didn't take std::map into account :-) – paxdiablo Aug 19 '11 at 01:30