Improve std::sort performance

Question

I have an index with about 10k items, which have to be sorted caseinsensitive lexicographically.

This is my approach:

bool lowercomp (AbstractServiceProvider::AbstractItem*  i, AbstractServiceProvider::AbstractItem* j)

{
    std::string a,b;

    // lower first string
    a.resize(i->title().length());
    std::transform(i->title().cbegin(), i->title().cend(), a.begin(),
                std::bind2nd(std::ptr_fun(&std::tolower<char>), std::locale("")));

    // lower 2nd string
    b.resize(j->title().length());
    std::transform(j->title().cbegin(), j->title().cend(), b.begin(),
                std::bind2nd(std::ptr_fun(&std::tolower<char>), std::locale("")));

    return 0 > a.compare(b);
}

Somwhere in my code:

t = new boost::timer::auto_cpu_timer;
std::sort(_index.begin(), _index.end(), lowercomp);
delete t;

But this takes about 4 seconds. Without the toLower part, it takes about 0.003 seconds. Is there a way to improve this?

That part I doubt it. `std::sort` is optimized as is. Any further optimization would require rather custom advanced trickery. — 101010, Sep 08 '14 at 12:29
Perhaps look at [Case insensitive string comparison in C++](http://stackoverflow.com/questions/11635/case-insensitive-string-comparison-in-c)? — crashmstr, Sep 08 '14 at 12:29
I don't exclude you might do a comparison with a collation table to speed up things.. however there's some unnecessary overhead in your lowercomp function — Marco A., Sep 08 '14 at 12:29
Declaring the arguments of your comparison function to be `const` might enable some optimizations by the compiler. . Also note that the comparison function of [`std::sort`](http://en.cppreference.com/w/cpp/algorithm/sort) should return true if the first argument is ***less*** than the second argument, you have it the opposite way. — Some programmer dude, Sep 08 '14 at 12:31
Use a profiler to find out what part of the code takes most of the 4 seconds. My wild guess with no empirical measurements is that you can move the functor object outside the std::transform call. Create one `tolower` functor, and re-use it. — Tadeusz A. Kadłubowski, Sep 08 '14 at 12:32
You should separate the lowering of the case of the string and the sort. The sort functor would be called multiple time with the same element (already lowered), in this case you would lost performance doing double work. Lower the case of all the element of the container, then sort. — NetVipeC, Sep 08 '14 at 12:34

John Zwinck · Answer 1 · 2014-09-08T14:08:10.127

4

You can definitely make it much faster. The solution is to avoid allocating memory, and instead compare the strings case-insensitively by converting one character at a time using tolower() while doing the comparison. There should be no construction of class objects in the comparison function. Something like this:

bool lowercomp(const AbstractItem* lhs, const AbstractItem* rhs)  
{
    size_t size = std::min(lhs->title().size(), rhs->title().size());
    for (size_t pos = 0; pos < size; ++pos) {
        if (tolower(lhs->title()[pos]) < tolower(rhs->title()[pos]) {
            return true;
        } else if (tolower(lhs->title()[pos]) > tolower(rhs->title()[pos]) {
            return false;
        }
    }
    return lhs->title().size() < rhs->title().size();
}

Let us know how fast that is. :)

edited Sep 08 '14 at 14:08

answered Sep 08 '14 at 12:33

John Zwinck

239,568
38
324
436

3

`size_t size = std::max(...)` leads to access violation, correct is `std::min` – BeyelerStudios Sep 08 '14 at 12:40
@BeyelerStudios: Indeed! Thanks for the correction, a silly mistake on my part. – John Zwinck Sep 08 '14 at 12:42
I agree. Try to write a compare function, which directly works on i->title(). If you can, try to pass the AbstractItems by const reference. – midor Sep 08 '14 at 12:48
Regretfully, the code presented doesn't work correctly, and using it for a comparison function will result in undefined behavior, because it doesn't establish a proper ordering. – James Kanze Sep 08 '14 at 13:13
@ManuelSchneid3r: I missed the `else if` case. Sorry about that, I hope it works better now. It was originally intended to be more of a sketch, but I guess it's bound to become a full-blown working solution. :) Without that case, "db" would be less than "bc" which is wrong (because "bc" would also be less than "db" which is a contradiction). – John Zwinck Sep 08 '14 at 14:08

score 4 · Accepted Answer · answered Sep 08 '14 at 13:10

Until you've seen the profiler output, to know where the slowdown is, you can't be sure, but there are a number of points which seem likely to cause a slowdown to me. The two most important are:

your function creates two new strings at each call. That can be very expensive, and
you use the two operand form of std::tolower; this function must extract the ctype facet each time it is called (and you construct a new temporary instance of the locale each time you invoke lowercomp.

My own preference is to use a functional object for the comparison. With some compilers, it's faster, but in this case, it's also a lot cleaner:

class CaseInsensitiveCompare
{
    std::locale myLocale;   //  To ensure lifetime of the facet.
    std::ctype<char> const& myCType;
public:
    CaseInsensitiveCompare( std::locale const& locale = std::locale( "" ) )
        : myLocale( locale )
        , myCType( std::use_facet<std::ctype<char>>( myLocal ) )
    {
    }
    bool operator()( AbstractItem const* lhs, AbstractItem const* rhs ) const
    {
        return (*this)( lhs->title(), rhs->title() );
    }
    bool operator()( std::string const& lhs, std::string const& rhs ) const
    {
        return std::lexicographical_compare(
            lhs.begin(), lhs.end(),
            rhs.begin(), rhs.end(),
            *this);
    }
    bool operator()( char lhs, char rhs ) const
    {
        return myCType.tolower(lhs) < myCType.tolower(rhs);
    }
};

Beyond this, there are several other points which might improve performance:

If you're sure of the lifetime of the locale you're using (and you usually can be), drop the myLocale member in the class; copying the locale will be the most expensive part of copying instances of this class (and the call to lexicographical_compare will copy it at least once).
If you don't need the localization features, consider using the tolower function in <cctype>, rather than the one in <locale>. This will avoid the need of any data members at all in the comparison.
Finally, although I'm not sure it's worth it for something as small as 10K items, you might consider making vectors with the canonical forms of the strings (already lower cased, etc.), sorting those using just < on the strings, and then reordering the original vectors according to that.

Also, I'm very suspicious of the `new boost::timer::auto_cpu_timer'. Do you really need dynamic allocation here? Off hand, I suspect a local variable would be more appropriate.

Wow nice! Where can I read more about this way of comparing. What are the relevant buzzwords for this kind of "compare-class"? — ManuelSchneid3r, Sep 08 '14 at 14:10
I'm not aware of any articles which describe the above; I just sort of worked it out when I had a similar problem (comparing file paths where each path was a vector of strings---Windows file names are case-insensitive, so that's how I had to compare the strings). There are a fair number of articles recommending functional objects instead of functions, but for the rest, I pretty much came up with it myself. — James Kanze, Sep 08 '14 at 17:34

score 0 · Answer 3 · answered Sep 08 '14 at 12:42

0

Your implementation strikes me as extraordinarily inefficient. I see several problems.

You are performing a tolower on both strings within the sort comparator. Since this comparator will be called on the order of n log n times, you are going to be tolowering two strings about 40K times (?) each.

I would not want to compare strings at all. Not only is a string comparison orders of magnitude less efficient than other methods (for example integral comparison), it's also error-prone and requires that you scrub the data -- another source of inefficiency.

If however you must compare strings, scrub them before you perform the sort. That includes tolowering them. Ideally, scrub the data upon element construction. Barring that, you could even scrub it just before you call sort. Whatever you do, don't scrub it within the comparator.

answered Sep 08 '14 at 12:42

John Dibling

99,718
31
186
324

The tolower()ing is unlikely to be the most important performance problem here--construction of so many std::strings and std::locales is. – John Zwinck Sep 08 '14 at 12:53
I agree with you on the general idea of preparing data beforehand, but we don't know if he needs the original information, which I suppose he does. He could either store an additional scrubbed string in the AbstractItem or lower the chars every single time, depending on whether he wants to save memory or computation. – midor Sep 08 '14 at 13:03
@midor Scrubbing is worth the effort when there are a lot of elements involved. But is 10K really a lot? – James Kanze Sep 08 '14 at 13:12
@James Kanze: I agree, that is why I said he can do either, based on whether he wants to save memory or computation. Yet, I think the number of elements is not as important as the number of times each element is expected to be touched by operations requiring the lower case version when considering this trade-off. If I give you 10 elements, which are needed in lower-case 10M times you would probably prefer the 10 copies over the 10M tolowers. I think from what he is asking for, he will probably do searches on his data-structure, which will also requires tolower-ops. Measure, then optimize it. – midor Sep 08 '14 at 13:20
@midor The number of times you have to do the operation on the same element is of course the key. I was only considering the price of doing it just for the sort; sorting 10K elements probably won't result in comparing the same element that often, probably less than 100 times. And the cost of allocating and filling new vectors with the canonical forms is high. If the strings are going to be compared often, it might be worth maintaining them as a pair, however. – James Kanze Sep 08 '14 at 17:30

Improve std::sort performance

3 Answers3