Algorithm: improve c++ sorting

Question

I have a program that asks to input N-numbers (N<=100000) and then it should output those numbers in sorting order. All numbers are int-s. I have the following code:

#include <iostream> 
#include <vector>
#include <algorithm> 
using namespace std;

int main()
{
    int n,x;
    cin >> n;        
    vector<int> v;
    
    for (int i = 0; i < n; i++) {
        cin >> x;
        v.push_back(x);
    }
    
    sort(v.begin(), v.begin() + n);
    
    for (int i = 0; i < n; i++) {
        cout << v[i];
    }
    return 0;
}

But when I run it on test server one test fails. And it seems it doesn't fit into time limit 1 second.

We don't know if it is spending more time in std::sort or in iostream. (comment out the sort and see if you still get a time fail?) You could also try a sorted container since you are getting the numbers one at a time. — Kenny Ostrom, Dec 22 '20 at 15:54
With current machines, runtime should be way below 1 second. On my machine (Ryzen 3), `{ echo 100000; shuf -i1-100000; } | /tmp/a.out` takes 1/10 of a second. — Olaf Dietsche, Dec 22 '20 at 16:03
If you are sure that the code produces correct results, Try profiling to find the bottleneck. — MrSmith42, Dec 22 '20 at 16:27
Why not `v.end()`? If you know the capacity required, `.reserve()` it. That's likely your bottleneck. — sweenish, Dec 22 '20 at 17:01
Where's the test server? Link to the problem please, so that (among other things) we can test the effects of possible improvements. — Kelly Bundy, Dec 22 '20 at 20:51
Oh and is that even *correct*? You're printing all numbers without any space or other kind of separators between them. — Kelly Bundy, Dec 22 '20 at 20:53

Ted Lyngmo · Accepted Answer · 2020-12-23T23:22:01.897

3

As a vector grows, it sometimes needs to relocate all the data. If you can avoid that, it'll often be faster.

One way is to reserve the number of elements you know that you need. Another is to create the vector with the correct number of elements from start.

vector<int> v(n);                 // created with n elements
    
for(auto& x : v) cin >> x;        // a range-based for loop (no push_back needed)

//sort(v.begin(), v.begin() + n); // no, why  begin() + n?  Do:
sort(v.begin(), v.end());

for(auto x : v) cout << x;        // another range-based for loop

One thing that could possibly speed up the sorting (requires C++17) is to use the overload that accepts an ExecutionPolicy and see if the implementation has support for sorting in parallel:

#include <execution>

    //...

    sort(std::execution::par, v.begin(), v.end()); // try sorting in parallel

I created a quick-bench test for three cases:

Not using the knowledge of n to reserve space at all, letting it realloc.
Using reserve(n)
Using n to create v with the right size at construction

Result:

Initialization of a vector does come with some cost though. Here's a comparison between the winner of the above and using a unique_ptr<int[]> created from a raw new[]:

To sum it up: Sorting may be sped up by using parallel sorting and filling¹ your v can be made ~26 times faster than your current implementation does it:

^{1 - By filling, I'm referring to the actual filling of v, which does not include reading the values}

edited Dec 23 '20 at 23:22

answered Dec 22 '20 at 16:04

Ted Lyngmo

93,841
5
60
108

I doubt the reallocations are a problem, but the range-based loop with references are very nice. – Kelly Bundy Dec 22 '20 at 20:50
@KellyBundy Thanks! The reallocations / relocation quite often prove costly, so If one knows how many elements to allocate from the start, it certainly helps. [example](https://godbolt.org/z/7aEzKK) - In the example there are 18 reallocs when N == 100000 and the total number of elements getting moved around is likely to be large. I didn't make a true test for that though, but, it's easy to avoid, and can as far as I know only help. :) – Ted Lyngmo Dec 22 '20 at 21:16
Yeah but the push_back is still amortized O(1) with low constant. Elements get (re)written on average 2 to 3 times. In this case only [2.0486 times](https://godbolt.org/z/jesT4s). And relocating is very low level and very cache friendly. I wish the OP told us the source so we could test there. Right now my guess is that it's the IO, perhaps the OP didn't even show us their real code but a version where they removed using `endl`. I mean, without separation between the printed numbers, I'd expect their code to be rejected on the very first and small test case, not even get tested on a large one. – Kelly Bundy Dec 22 '20 at 21:41
Oh, one more thing I just noticed: The `vector v(n);` takes some time as well, right? At least at some point it's gonna write n zeros? Unless there's some trickery like the OS having real pre-zeroed memory available already or so. But I wouldn't count on that. So with your way, you'd average 2 writes, one for the initial zeros and one for the real values. Almost the same as the 2.0486. Granted, zeros might be faster, but still. – Kelly Bundy Dec 22 '20 at 21:50
@KellyBundy Perhaps removing some C++ overhead could make the output better. I haven't looked into the numbers you've presented properly but I'm getting the feeling that you'd prefer allocating one big chunk (uninitialized)? I'm all for that if the initialization of the vector becomes costly. I made a few versions of containers a while back. They where for 2D containers and I also noticed that by leavinging them uninitialized when I knew I'd write before reading, everything became quicker. [old tinkering](https://stackoverflow.com/a/58122229/7582247) – Ted Lyngmo Dec 22 '20 at 23:21
Nah, regarding *costs* for allocation/initialization/relocation, I don't think I prefer any version, as I suspect all are fast and not the OP's problem. I do prefer your way for its elegance, though. – Kelly Bundy Dec 23 '20 at 03:46
@KellyBundy When in doubt, test it. I added tests to my answer. – Ted Lyngmo Dec 23 '20 at 12:51
I'd say that's nice for what it really is, but as presented, it's rather misleading. You say filling can be made 26x faster. I'd say filling includes filling with actual read values. And when I do that, your improved versions seem [barely any faster at all](https://quick-bench.com/q/bgr9jXvmr2_Uo2WJYbfod3fjhXU). And even *that* is only a part of the even bigger whole story, which includes the sorting and the outputting. So in the whole story, the speed impact of those improvements is pretty much nothing. (This all assumes that an `istringstream` is comparable to `std::cin`.) – Kelly Bundy Dec 23 '20 at 20:23
@KellyBundy I made a clarification when it comes to the filling part. I think `std::cin` is _way_ much slower than an `istringstream` (but haven't actually tested it) which would make the relative impact of my suggested change even smaller I guess - but your benchmark contains the creation of an `istringstream` from a very large string inside the measured loop which is bound to color the results quite a lot. – Ted Lyngmo Dec 23 '20 at 23:40
Yes, I also suspect that if they're not comparable, then `istringstream` is the faster one. About your last point: I don't think the creation of the `istringstream` colors the results a lot, or even noticeably. Did you see my fourth benchmark there, the "Baseline" one? That also creates the `istringstream`, and reads the first number from it. And it takes almost no time in comparison to the other three benchmarks. That's the point of that baseline benchmark - to show that that initialization doesn't matter. Did I mess that up? – Kelly Bundy Dec 24 '20 at 02:16
Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/226373/discussion-between-ted-lyngmo-and-kelly-bundy). – Ted Lyngmo Dec 24 '20 at 06:03

score 0 · Answer 2 · answered Dec 22 '20 at 15:51

0

One thing you can improve your code is the line of v.push_back(x);. You can add one line after vector<int> v; to pre allocate memory, v.reserve(n);;

This way you won't re-allocate memory again and again.

answered Dec 22 '20 at 15:51

CS Pei

10,869
1
27
46

score 0 · Answer 3 · answered Dec 22 '20 at 15:53

This is pretty much the faster you realistically do. Are you sure there isn't a faster way to solve the problem that sorting? Maybe you only need the largest or smallest number?

If you want to squeeze what's left of performance gains:

vector<int> v;
v.reserve(n); // this will help a bit

Addressing a.Li's comment:

If the numbers' range is small, you could simply count how many time each number appears. This would be faster for number values in up to ten thousand, probably.

Algorithm: improve c++ sorting

3 Answers3