C++ sorting classes faster than qsort

Question

I have a class

class Zaposlenik { 
private:
    string prezime; 
    string funkcija; 
    double placa; 
public:
    bool operator==(const string& prezime) const; 
    bool operator<(const string &prezime) const; 
    bool operator<(const Zaposlenik &other) const;

I use operators with string for binary search and operator< with Zaposlenik for sorting

I can't change the header class I can only write code in .cpp.

I also have

class Firma { 
private: 
vector<Zaposlenik> zaposlenici; 
public: 
void sort();

I can't change that class either,I have to write .cpp for it. I upload the 2 .cpp to a auto grading server that inputs 500 000 Zaposlenik into vector zaposlenici and then preforms 2 000 000 searches.

I used qsort and bsearch and it was too slow. It can't be longer then 3s when I upload it.

I have written overloaded operators and I belive they are fine,but apparently qsort can be faster.

Vector is sorted by string prezime and names are from "aaaa" to "ZZZZ" so 4 letter combination of big and small letters.

string funkcija; and double placa; don't mean anything for sorting.

Can someone tell me which sort would be faster then qsort? Keep in mind that I don't have any control over main and I can't count members when they are made.

P.S. there are other functions in classes but they don't have any meaning for this part. There is also function for Bsearch but that is as fast as it gets I belive.

I would recommend using [`std::sort`](http://en.cppreference.com/w/cpp/algorithm/sort) and let the library implementors worry about the optimal sorting routines. Oh, and also remember to profile and measure before doing any kind of optimization, the bottlenecks might not be where you think they are. — Some programmer dude, Apr 29 '13 at 13:04
All perfomance are lost on memory copying. Best way - to redesign your container to operate pointers instead of classes. But if you can't change headers - try to google about sorting with minimum data movement. — Dmitry Sazonov, Apr 29 '13 at 13:08
@Dmitry No copying is needed at all, just swaps, and they are much faster. — Konrad Rudolph, Apr 29 '13 at 13:10
@DmitrySazonov: Unless profiling tells you otherwise, don't assume that is the bottleneck. Sorting requires swapping, not copying, and string swapping typically is efficient. — MSalters, Apr 29 '13 at 13:11
If I use sort(&zaposlenici[0],&zaposlenici[zaposlenici.size()]); it's slower then qsort(&zaposlenici[0],zaposlenici.size(),sizeof(zaposlenici[0]),compare); and compare is int compare(const void* a,const void* b){ if (*(Zaposlenik*)a < *(Zaposlenik*)b) return -1; else return 1; } sorry I don't know how to make comments go to new line — Medo, Apr 29 '13 at 13:12
@DomagojMedo that `compare` you just said doesn't look like it actually compiles. Casting a `const void*` into an instance of a `class`? — Yakk - Adam Nevraumont, Apr 29 '13 at 13:16
If `qsort` is faster than `std::sort`, it's probable that your comparison function isn't doing everything it should. — James Kanze, Apr 29 '13 at 13:25
@Yakk it compiles but when I pasted code it left out all the * — Medo, Apr 29 '13 at 13:31
@MSalters I'm sure without profiling, that exchanging of two sizeof(void*) values (eg. pointers) are faster, than exchane of any non-empty structures. — Dmitry Sazonov, Apr 29 '13 at 13:32
@Konrad Rudolph what you mean by "swap"? For two pointers it is a ^= b ^= a ^= b; - three xor instructions. For two structs - it is t = a; a = b; b = c; - thee memcpy with temporary buffer. What is faster? — Dmitry Sazonov, Apr 29 '13 at 13:34
@DmitrySazonov I trust my answer (+ your comment there) clarifies this? (Incidentally, swap via temporary is usually faster than via xor on modern architectures ;-)) — Konrad Rudolph, Apr 29 '13 at 13:38
@DomagojMedo include code in `backticks` -- the quote character near ~ that is backwards compared to ' and looks like ` -- in order to get proper code formatting to occur. Outside of comments, an empty line and a 4 space indent is for blocks of code, and backticks are for short bits of code. — Yakk - Adam Nevraumont, Apr 29 '13 at 13:40
@DomagojMedo If you use `qsort( &zaposlenici[0], &zaposlenici[zaposlenici.size()] )`, you have undefined behavior, and will get a runtime error in most implementations, unless you turn off error checking. — James Kanze, Apr 29 '13 at 13:57
And of course, any attempt to use `qsort` on a non-POD class, like yours, is undefined behavior. — James Kanze, Apr 29 '13 at 13:57
@MSalters Swapping strings is typically efficient, but he's sorting `Zapolslenik`, not strings. Neither the compiler nor the library will look into his class, and automatically create an optimized swap for it. — James Kanze, Apr 29 '13 at 14:00
@DmitrySazonov Depending on the class, they're often exactly the same thing. — James Kanze, Apr 29 '13 at 14:00
@DmitrySazonov For two pointers, the fastest solution is usually `Ptr tmp( a ); a = b; b = tmp;` Just as it is for everything else. (And of course, the `^=` doesn't even work.) — James Kanze, Apr 29 '13 at 14:02
@Konrad Rudolph TC doesn't need to swap 2 POD variables, in sorting he is swapping 2 non-POD structures. See declaraion: !!!vector!!!. You are talking that swapping 2 std::string + 1 double is faster or equal than swapping 1 POD value? Please, provide proof code, if yes. — Dmitry Sazonov, Apr 29 '13 at 15:08
@DmitrySazonov Irrelevant (and I didn’t imply that) since you *cannot* perform xor swap on these variables, the only relevant comparison is between pointer types. — Konrad Rudolph, Apr 29 '13 at 15:17
@Konrad Rudolph I only want to show that swap of pointers is much faster, then swap of complex structures. — Dmitry Sazonov, Apr 29 '13 at 20:19
@DmitrySazonov - yes, it is, and is fine with non-copyable or non-moveable items. Sorting items with non-trivial ctors in any other way than using pointers is... 'strange'. — Martin James, Apr 30 '13 at 00:00

score 9 · Accepted Answer · edited May 23 '17 at 11:43

9

Three things:

Use std::sort instead of std::qsort, it’s faster because it can inline calls to the comparison operator (if you define it in the header or enable link-time optimisations).
Override swap for your class so that it can be swapped efficiently instead of copying via a temporary variable. However, that requires changing the header (because you need access to the private variables).
Since your sorted strings are of fixed length 4, a different sorting algorithm will be beneficial. The classical choice, which is fairly easy to implement, is radix sort. Judging from some of your comments it seems likely that your professor wants you to implement this.

edited May 23 '17 at 11:43

Community

1
1

answered Apr 29 '13 at 13:12

Konrad Rudolph

530,221
131
937
1,214

1

Make sure that `bool Zaposlenik::operator<(const Zaposlenik &other) const; ` can be inlined, i.e. define it in the header. – MSalters Apr 29 '13 at 13:13
@MSalters “link-time optimisations”. – There really isn’t a reason not to use them. – Konrad Rudolph Apr 29 '13 at 13:14
The "auto grading server" might not support them. Usually you can only submit code. – MSalters Apr 29 '13 at 13:17
Agree with overriding "swap". But, as I understood, TC doesn't have interface for direct accessing members, and could not modify header files. – Dmitry Sazonov Apr 29 '13 at 13:37
@DmitrySazonov: No need to "override" swap; quality implementations have already done so. (And with non-quality implementations, it's unlikely that `swap` is the biggest problem.) – MSalters Apr 29 '13 at 15:02
1

@MSalters Only in C++11 will you get an efficient `swap`; C++03’s default implementation needs to use the copy constructor. – Konrad Rudolph Apr 29 '13 at 15:08

score 3 · Answer 2 · answered Apr 29 '13 at 23:39

auto grading server that inputs 500 000 Zaposlenik into vector zaposlenici and then preforms 2 000 000 searches. I used qsort and bsearch and it was too slow.

Just to clarify, you did not call qsort before every call to bsearch, right? Because binary search is only fast if the list is already sorted. If you sort the list before every search, you get abysmal performance.

Given the constraints you outlined (all strings are four characters long), I just tested std::sort vs. a custom bucket sort, and on a million elements, the bucket sort was 8 times faster. Hint: strings of four characters can be encoded in 4 * 6 = 24 bits, so you will need 16.777.216 buckets for counting.

score 2 · Answer 3 · answered Apr 29 '13 at 13:49

2

There are several things to take into account. First and foremost, you cannot use qsort on std::vector<Zaposlenik>. qsort uses memcpy to copy objects when it swaps, and memcpy only works on objects with trivial copy. Using qsort in this case may be faster because it doesn't correcly copy the objects. You must use std::sort; nothing else will work.

Having done that: the speed (or at least the party that you can influence) of std::sort depends on two things: the speed of your comparison, and the speed of a swap. You don't show what bool Zaposlenik::operator<( Zaposlenik const& other ) const; does, so we can only guess. If it does anything more than return prezime, other.prezime ), then you should write a separate comparison function, and invoke std::sort with it. The other aspect is the swap: std::sort ultimately uses std::swap, whose default implementation will be something like:

template <typename T>
void
std::sort( T& lhs, T& rhs )
{
    T tmp( lhs );
    lhs = rhs;
    rhs = lhs;
}

For many classes, this involves a lot of extra copying; for std::string, for example, this would do three deep copies of the string, which may involve dynamic allocation and freeing of the memory. std::string, however, has a member function swap; in many cases, it can implement the swap just by swapping pointers, rather than by doing a deep copy. This can result in a significant speed up. Your class Zaposlenik does not do anything to optimize std::swap, however, so you get the deep copies. You should provide a member function swap:

void Zaposlenik::swap( Zaposlenik& other )
{
    swap( prezime, other.prezime );
    swap( funkcija, other.funkcija );
    swap( placa, other.placa );
}

This function will use the system optimized swaps of std::string. To make sure that std::sort uses it, you should also provide an overloaded free function swap (in the same namespace as Zaposlenik), which calls your member function:

void
swap( Zaposlenik& lhs, Zaposlenik& rhs )
{
    lhs.swap( rhs );
}

The reason for this: std::sort calls a free function swap.

answered Apr 29 '13 at 13:49

James Kanze

150,581
18
184
329

bool Zaposlenik::operator<(const Zaposlenik &other) const{ if (prezime < other.prezime) return true; else return false; }; – Medo Apr 29 '13 at 13:54
I can't make swap in Zaposlenik.h – Medo Apr 29 '13 at 13:59
1

@DomagojMedo And why not simply: `return prezime < other.prezime;`. Whoever wrote that code doesn't know C++, or even understand the basics of computer science. – James Kanze Apr 29 '13 at 14:06
@DomagojMedo And you cannot access private members from a new free function? Then you're more or less stuck; there's not much you can do about swapping. – James Kanze Apr 29 '13 at 14:07
,when I use return prezime < other.prezime; execution time is same – Medo Apr 29 '13 at 14:10
Probably, since the compiler will generate exactly the same code in the two cases. The difference is in verbosity. – James Kanze Apr 29 '13 at 14:15
@JamesKanze: Regarding private members: You could be *really* evil and not include the .h file in the .cpp file and declare your own version of Zaposlenik with all members made public. It *should* align perfectly with the original one, but you would just have reserved yourself a nice place in programmer hell :) – bitmask Apr 29 '13 at 14:18
@bitmask That probably would work (although it's clearly undefined behavior). For that matter, just `reinterpret_cast( zaposlinici[i] )` will almost certainly give you a reference to the `prezime` field, unless there are virtual functions somewhere in the class. (At any rate, both would be more reliable than using `qsort`:-).) – James Kanze Apr 29 '13 at 14:25
@bitmask That would violate the ODR. – fredoverflow Apr 29 '13 at 22:56

bitmask · Answer 4 · 2013-04-29T14:13:44.600

1

The issue here really is your restriction of the class header. I suspect the bottleneck is either the swapping operation while sorting or the lexical string comparison (or possibly both). If you cannot change the class definition at all, it's going to be tricky to improve that, since you would have to add a lot of helper code in your implementation and everything gets more complicated than it has to be.

Anyhow, here is the approach I would suggest: Since you are sorting based on strings, implement yourself a specialised version of a Trie, you cannot beat the performance of a Trie when sorting sequences lexicographically. You can implement this data structure entirely in your .cpp file and instantiate it in your Firma::sort method.

As you seem to be focussing on speed, you are probably willing to make a trade-off with regard to memory consumption. So, you implement each Node in your Treap as either an std::vector<std::shared_ptr<Trie>> which you initialise to a length of 256 (with all slots initialised to nullptr) or an std::array<std::shared_ptr<Trie>,256>. You now basically insert each of your strings into the data structure and then read them all out again. This approach is linear in the total size of all strings combined (and thus optimal).

Side note: Note that the 256 slot table at each node incurs a constant factor of 256 when traversing the Trie (i.e. when writing the Firma::zaposlenici member). If you are dealing with ASCII you can reduce the table size to 128 or split individual bytes into half-bytes, thereby incurring an overhead of 2*16 instead of 256.

Edit: If you know that you will only encounter characters from a..z and A..Z then you have a base alphabet size of 2*26 = 52 instead of 256. So your lookup table in each node of the Trie only has to be of size 52 (that is, each node can have at most 52 child nodes).

edited Apr 29 '13 at 14:13

answered Apr 29 '13 at 13:16

bitmask

32,434
14
99
159

When I asked my professor for a little help he said that it would be trivial to implement faster solution and that I should google sort O(n) but that didn't help me at all. Every way that has come up to mind I was unable to do because I don't have access to main. – Medo Apr 29 '13 at 13:20
@DomagojMedo: Could you elaborate what you mean by "I don't have access to main"? Which part is still unclear to you? – bitmask Apr 29 '13 at 13:23
I only upload the 2 .cpp for classes,headers and main are not mine. And I don't know how main looks,I only know that prezime(string im sorting by) is 4 letters long. And frankly I don't know anything about what you just wrote,this is my first year of C and C++ coding. – Medo Apr 29 '13 at 13:24
@DomagojMedo "little" details like "strings are only 4 letters long" are rather important. What are "letters" in this case, any `char` value, or more restricted? – Yakk - Adam Nevraumont Apr 29 '13 at 13:27
from "aaaa" to "ZZZZ" so lower and uppercase of letters and as far as I can tell there aren't 2 same – Medo Apr 29 '13 at 13:28
All the time I'm thinking about bucket or radix sort but I don't know how to implement them... he said that sort should be O (n) and search O(1) but I have no idea how to make search O(1),sounds imposible to me – Medo Apr 29 '13 at 13:33
@DomagojMedo: Sorry, my brain mixed up the name of Treaps and Tries. Treaps do a completely different thing. I was talking about a Trie. Check out the new link above, it will tell you how a Trie works in more detail than my answer. You can implement everything I suggested within your .cpp file (the one you have to upload). No need to touch either main.cpp or any header file (class definitions are allowed within .cpp files). Please come back to me if you still need additional clarification. – bitmask Apr 29 '13 at 13:35
@bitmask the thing is we didn't learn tries and profesor said that it can be done with current knowledge. He said that I need to change my algorithm for sorting – Medo Apr 29 '13 at 13:39
@DomagojMedo: You can implement sort in O(n*wordlength) and search in O(wordlength) using a Trie. I'm pretty sure this is the solution your professor had in mind. Sorting in linear time is provably impossible using comparison based search algorithms (such as `std::sort` and `qsort`). However, since you are not allowed to modify your class definition, you have to get inventive how to extend the life-time of your helper data structure after `Firma::sort` terminates. It's possible, but it's not clean software development. – bitmask Apr 29 '13 at 13:40
@DomagojMedo: Well, regarding "current knowledge" ... the idea of a Trie is quite related to bucket-sort. It's really not that complicated (the Wikipedia page even has a section on sorting). – bitmask Apr 29 '13 at 13:42
@bitmask if I use a Trie can get get search time of O (1) ? – Medo Apr 29 '13 at 13:51
@DomagojMedo You worry me. There are no O(n) algorithms which will sort an `std::vector`. And building any secondary data structure which will support faster sorting will require at least O(n lg n) as well, so it won't be a win. – James Kanze Apr 29 '13 at 13:53
@bitmask If the goal is O(1) search time, and you're not restricted in the time it takes to build the data structure, then a hash table (`std::unordered_set`) would be the fastest. If you know the exact set of input keys, you can even get a perfect hash. But the data won't be sorted. – James Kanze Apr 29 '13 at 13:54
@JamesKanze data needs to be sorted – Medo Apr 29 '13 at 13:57
@DomagojMedo: As I said, you get search time of O(max(wordlength)). If all your words have a fixed maximal size (say, at most length 4) then yes, you can search in O(1) and sort in O(n) (assuming n denotes the total number of words). – bitmask Apr 29 '13 at 14:01
@DomagojMedo That's what I initially understood. In which, case, there is no O(n) solution. – James Kanze Apr 29 '13 at 14:04
@JamesKanze: I'm flying blind here: Is there a decent `std::string` specialisation for hashing in the standard library? Also, on a completely different subject; I think you mixed up `sort` and `swap` in the first block code in your answer (+1 nonetheless). – bitmask Apr 29 '13 at 14:04
@bitmask yes,all words are length 4 – Medo Apr 29 '13 at 14:05
@bitmask There is a decent implementation of `std::unordered_set` (in C++11, at least). The standard (C++11, again) also requires an implementation of `std::hash` for `std::string`; it doesn't say anything about the quality of that function, however, and g++, at least, has serious problems with very large sets where all of the strings are very short. But you can easily provide your own: if all of the strings are length four, the simplest would be to just put one character on each byte of a four byte `unsigned`. – James Kanze Apr 29 '13 at 14:14
Profesor just told me that I should try with hash tables or map – Medo Apr 29 '13 at 14:33
@DomagojMedo: That doesn't make sense; You cannot sort with hash maps. That's the entire point of hashing (you can lookup "very" fast, but you don't get sorted data). Anyway, I guarantee you wont get asymptotically faster than Tries. If your assignment *requires* you to output sorted data use either a specialised `std::swap`+`std::sort` or use a Trie. Your call. **Hash maps will *not* help you sort.** – bitmask Apr 29 '13 at 14:45

C++ sorting classes faster than qsort

4 Answers4