4

I need to make lots of unions of ordered set of integers (I would like to avoid duplicates, but it is okay if there are).

This is the code with the best performance so far :

// some code added for better understanding
std::vector< std::pair<std::string, std::vector<unsigned int> > vec_map;
vec_map.push_back(std::make_pair("hi", std::vector<unsigned int>({1, 12, 1450});
vec_map.push_back(std::make_pair("stackoverflow", std::vector<unsigned int>({42, 1200, 14500});

std::vector<unsigned int> match(const std::string & token){
    auto lower = std::lower_bound(vec_map.begin(), vec_map.end(), token, comp2());
    auto upper = std::upper_bound(vec_map.begin(), vec_map.end(), token, comp());

    std::vector<unsigned int> result;

    for(; lower != upper; ++lower){
        std::vector<unsigned int> other = lower->second;
        result.insert(result.end(), other.begin(), other.end());
    }
    std::sort(result.begin(), result.end()); // This function eats 99% of my running time

    return result;
}

valgrind (using the tool callgrind) tells me that I spend 99% of the time doing the sort.

This is what I tried so far :

  • Using std::set (very bad performance)
  • Using std::set_union (bad performance)
  • maintaining a heap with std::push_heap (50% slower)

Is there any hope to gain somehow some performance? I can change my containers and use boost, and maybe some other lib (depending on its licence).

EDIT integers can be as big a 10 000 000 EDIT 2 gave some example of how I use it, because of some confusion

Hasturkun
  • 35,395
  • 6
  • 71
  • 104
Tristram Gräbener
  • 9,601
  • 3
  • 34
  • 50
  • Are the integers all in the range 0..31? If so, you could just bitwise-OR together a whole bunch of bitmasks.... – jwodder Oct 04 '11 at 17:19
  • Is there any pattern in `result` after the for block? Seems like there should be. – Captain Giraffe Oct 04 '11 at 17:20
  • Nope, integers can go to a few millions :( – Tristram Gräbener Oct 04 '11 at 17:21
  • Actually, as long as the integers aren't too sparse, you could just OR together a bunch of [bitsets](http://cplusplus.com/reference/stl/bitset/). – jwodder Oct 04 '11 at 17:21
  • 3
    (1) Why do `lower_bound` and `upper_bound` have different comparisons? Maybe a bug? (2) Use `reserve` on result before you put anything in it. Huge savings. (3) make `other` a reference to a vector. Huge savings. (4) Why are you sorting? The ints were already sorted (or else you can't use `lower_bound`). You shouldn't need to sort again. – Mooing Duck Oct 04 '11 at 17:22
  • @CaptainGiraffe what do you mean by pattern? For now it's just the concatenation of all my integer sets, nothing more. – Tristram Gräbener Oct 04 '11 at 17:23
  • @Tristam I would expect some sort of order in `result`, as Mooing suggests. – Captain Giraffe Oct 04 '11 at 17:26
  • @MooingDuck (1) no bug there (2) no big difference on this specific part (3) I believe the compiler optimises it; I just added it for SO to be easier to read, but it didn't change anything performancewise (4) lower_bound works on a vector>> and is sorted. However, the concatenation of vectors isn't – Tristram Gräbener Oct 04 '11 at 17:26
  • 2
    @TristramGräbener: I just realized that vec_map is a map of vectors of ints, not a map of ints. Thus (several people's) confusion. `reserve` and the reference made little difference? That's surprising. Good compiler. – Mooing Duck Oct 04 '11 at 17:31
  • @Tristram How big is `result` ? ballpark. Also the reason for the need for `comp() / comp2()` is interesting. Every once in a while I wish I could +1 edits :) – Captain Giraffe Oct 04 '11 at 17:34
  • -1 voted to close as unreal. e.g. the line `std::vector other = lower->second;` is meaningless drivel, AFAICT. – Cheers and hth. - Alf Oct 04 '11 at 17:35
  • Sorry for the confusion. The sets are rather small (less than 1000 elts), so reserve changed nothing. The reference is rather easy to detect by the compiler. Leaving out the sort, the execution time goes from 3000ms down to 80ms. Maybe if I solve that problem, I can look closer to allocation bottlenecks. – Tristram Gräbener Oct 04 '11 at 17:38
  • 2
    Guesswork: a [merge](http://msdn.microsoft.com/en-us/library/9ew9xdb2%28v=VS.71%29.aspx) instead of an insert to the result, might make a difference. Should keep the result sorted. – Captain Giraffe Oct 04 '11 at 17:38
  • 4
    @AlfP.Steinbach: Uh, if `vec_map` is of type `std::map >` it would be fine. @OP: You might get better answers if you included a few small examples of sample input and expected output. – user786653 Oct 04 '11 at 17:39
  • @user786653: right, sorry. i find no way to remove the close-vote though. but that'll teach the OP to include declarations etc. as he/she should. :-) – Cheers and hth. - Alf Oct 04 '11 at 17:44
  • @AlfP.Steinbach lesson learned :) – Tristram Gräbener Oct 04 '11 at 17:45
  • @TristramGräbener You have several syntax errors (e.g. `usigned`) and your example is incomplete (what is `comp` and `comp2`?). It think you'd have a better chance of receiving a helpful answer if you: (1) include a minimal but complete code sample. (2) Explain your end goal more clearly. – Branko Dimitrijevic Oct 04 '11 at 18:40
  • 1
    You said that `result` is less than 1000 elements? I have a hard time believing that `std::sort` takes 2920ms to sort less than 1000 ints. Your problem is somewhere else. – Blastfurnace Oct 04 '11 at 19:06
  • 1
    @Blastfurnace I have serious problems being clear ;) Those 3secs are benchmarked making 300 runs. So it's about 10ms per sort. – Tristram Gräbener Oct 04 '11 at 20:16
  • Did it work better with merge? – Captain Giraffe Oct 04 '11 at 21:13
  • no, sorry, merge didn't help out very much :( – Tristram Gräbener Oct 05 '11 at 07:43
  • Merging is O(n log n) as is the sort; I wouldn't expect much improvement unless the lists you're merging are significant in size compared to the final result. – Mark Ransom Oct 05 '11 at 15:34

5 Answers5

2

This looks like an instance of multi-way merge. Depending on the input (profile and time!), the best algorithm might be what you have or something that builds the result incrementally by selecting the smallest integer from all the containers or something more complex.

user786653
  • 29,780
  • 4
  • 43
  • 53
1

ALTERNATIVE SOLUTION:

Method std::sort (if it is based on quick-sort) is very good sorting non-sorted vectorsO(logN), is even better with sorted vector, but if your vector is inverted sorted have O(N^2). It may happen that when you perform the union you have a lot of operands, with the first one containing bigger values than the followers.

I would try the following (I suppose elements from input vectors are already sorted):

  • As suggested by other people here you should reseve the required size on the results vector before start filling it.

  • In case std::distance(lower, upper) == 1 there is no reason to union, just copy the contento of the single operand.

  • Sort the operands of your union, maybe by size (bigger first), or if the ranges don't overlap or partially overlap just by first value), so that you maximize the number of elements that are already sorted on the next step. Probably the best is a strategy that keep in consideration both things SIZE and RANGE of every operand of your union. A lot depends on the real data.

  • If there are few operands with a lot of elements each, keep appending the elements on the back of the result vector, but after appending each vector (from the second) you can try to merge (std::inplace_merge) the old content with the appendend one, this also deduplicates elements for you.

  • If the number of operands is big (comparing to the total number of elements) then you should stay with your former strategy of sorting afterward but call std::unique to deduplicate after sort. In this case you should sort by the range of elements contained.

rressi
  • 21
  • 2
  • I studied a bit better my proposal and I find out there is a better approach: merge sort. – rressi May 11 '14 at 08:14
  • You append the slices of sorted elements from operands one after other (better if in appropriate order) than you call std::inplace_merge on each pair of slice again and again until you end up with a single slice. Std::sort is NlogN (probably in your case is even worst, remember that can degenerate up to N square!). – rressi May 11 '14 at 08:26
  • This application of merge sort, where you already have the divide part for free, you have just to conquer, you have a NlogK where K is the number of operands, that is at worst == to N, but usually << or just < than N. But the logK here is guaranteed. Moreover also deduplicate elements thanks to inplace_merge so you laso reducenthe number of elements in last iterations.... ;-) – rressi May 11 '14 at 08:27
1

A custom merge sort may give a tiny amount of help.

#include <string>
#include <vector>
#include <algorithm>
#include <map>
#include <iostream>
#include <climits>

typedef std::multimap<std::string, std::vector<unsigned int> > vec_map_type;
vec_map_type vec_map;
struct comp {
    bool operator()(const std::string& lhs, const std::pair<std::string, std::vector<unsigned int> >& rhs) const
    { return lhs < rhs.first; }
    bool operator()(const std::pair<std::string, std::vector<unsigned int> >& lhs, const std::string& rhs) const
    { return lhs.first < rhs; }
};
typedef comp comp2;

    std::vector<unsigned int> match(const std::string & token){
        auto lower = std::lower_bound(vec_map.begin(), vec_map.end(), token, comp2());
        auto upper = std::upper_bound(vec_map.begin(), vec_map.end(), token, comp());

        unsigned int num_vecs = std::distance(lower, upper);
        typedef std::vector<unsigned int>::const_iterator iter_type;
        std::vector<iter_type> curs;
        curs.reserve(num_vecs);
        std::vector<iter_type> ends;
        ends.reserve(num_vecs);
        std::vector<unsigned int> result;
        unsigned int result_count = 0;

        //keep track of current position and ends
        for(; lower != upper; ++lower){
            const std::vector<unsigned int> &other = lower->second;
            curs.push_back(other.cbegin());
            ends.push_back(other.cend());
            result_count += other.size();
        }
        result.reserve(result_count);
        //merge sort
        unsigned int last = UINT_MAX;
        if (result_count) {
            while(true) {
                //find which current position points to lowest number
                unsigned int least=0;
                for(unsigned int i=0; i< num_vecs; ++i ){
                    if (curs[i] != ends[i] && (curs[least]==ends[least] || *curs[i]<*curs[least]))
                        least = i;
                } 
                if (curs[least] == ends[least])
                    break;
                //push back lowest number and increase that vectors current position
                if( *curs[least] != last || result.size()==0) {
                    last = *curs[least];
                    result.push_back(last);
                            }
                ++curs[least];
            }
        }
        return result;
    }

    int main() {
        vec_map.insert(vec_map_type::value_type("apple", std::vector<unsigned int>(10, 10)));
        std::vector<unsigned int> t;
        t.push_back(1); t.push_back(2); t.push_back(11); t.push_back(12);
        vec_map.insert(vec_map_type::value_type("apple", t));
        vec_map.insert(vec_map_type::value_type("apple", std::vector<unsigned int>()));
        std::vector<unsigned int> res = match("apple");
        for(unsigned int i=0; i<res.size(); ++i)
            std::cout << res[i] << ' ';
        return 0;
    }

http://ideone.com/1rYTi

Mooing Duck
  • 64,318
  • 19
  • 100
  • 158
  • wouldn't the `result.push_back(**least)` Need to be *after* the loop that finds the current position pointing to the lowest number? – Artomegus Oct 04 '11 at 18:31
  • yup. I'd changed the indentation at one point and apparently mistook what belonged in which loop. – Mooing Duck Oct 04 '11 at 18:35
  • Ever since I learned about `result_count --> 0` I don't make mistakes with decrementing unsigned integers anymore :D http://stackoverflow.com/questions/1642028/what-is-the-name-of-this-operator – Mooing Duck Oct 04 '11 at 18:38
  • BTW I was also thinking along the same lines, wondering if a custom multi-merge would help. This could also easily be changed into a set union by keeping track of the last value pushed. – Artomegus Oct 04 '11 at 18:45
  • @Artomegus: Right, I'd merged duplicates too. Fix'd. – Mooing Duck Oct 04 '11 at 19:01
  • Almost there... 1) You need a `last = **least;` in a block with the `result.push_back...`, 2) there's a semicolon instead of a closing parenthesis in `if (*least == ends[0];` – Artomegus Oct 04 '11 at 19:17
  • alright, not it actually _runs_ as well as compiles. I was comparing iterators from different containers before, which you aren't allowed to do. – Mooing Duck Oct 04 '11 at 19:21
  • Still need a `last = *curs[least];` in there with the push_back, perhaps in the same statement: `result.push_back(last = *curs[least]);`? – Artomegus Oct 04 '11 at 19:30
  • @Artomegus: Sorry about that, I didn't think that through at all. – Mooing Duck Oct 04 '11 at 19:33
  • This approach gives a slight improvement, but nothing big. Thank you a lot for your effort. – Tristram Gräbener Oct 05 '11 at 07:46
  • @TristramGräbener: If you're using vectors, references, and a multiway merge sort, that's about as good as it gets. Sorry. – Mooing Duck Oct 05 '11 at 16:02
0

If the number of elements is relatively large percentage of range of possible ints, you might get a decent performance out of what is essentially a simplified "hash join" (to use DB terminology).

(If there is a relatively small number of integers compared to range of possible values, this is probably not the best approach.)

Essentially, we make a giant bitmap, then set flags only on indexes corresponding to the input ints and finally reconstruct the result based on these flags:

#include <vector>
#include <algorithm>
#include <iostream>
#include <time.h>

template <typename ForwardIterator>
std::vector<int> IntSetUnion(
    ForwardIterator begin1,
    ForwardIterator end1,
    ForwardIterator begin2,
    ForwardIterator end2
) {

    int min = std::numeric_limits<int>::max();
    int max = std::numeric_limits<int>::min();

    for (auto i = begin1; i != end1; ++i) {
        min = std::min(*i, min);
        max = std::max(*i, max);
    }

    for (auto i = begin2; i != end2; ++i) {
        min = std::min(*i, min);
        max = std::max(*i, max);
    }

    if (min < std::numeric_limits<int>::max() && max > std::numeric_limits<int>::min()) {

        std::vector<int>::size_type result_size = 0;
        std::vector<bool> bitmap(max - min + 1, false);

        for (auto i = begin1; i != end1; ++i) {
            const std::vector<bool>::size_type index = *i - min;
            if (!bitmap[index]) {
                ++result_size;
                bitmap[index] = true;
            }
        }

        for (auto i = begin2; i != end2; ++i) {
            const std::vector<bool>::size_type index = *i - min;
            if (!bitmap[index]) {
                ++result_size;
                bitmap[index] = true;
            }
        }

        std::vector<int> result;
        result.reserve(result_size);
        for (std::vector<bool>::size_type index = 0; index != bitmap.size(); ++index)
            if (bitmap[index])
                result.push_back(index + min);

        return result;

    }

    return std::vector<int>();

}

void main() {

    // Basic sanity test...
    {

        std::vector<int> v1;
        v1.push_back(2);
        v1.push_back(2000);
        v1.push_back(229013);
        v1.push_back(-2243);
        v1.push_back(-530);

        std::vector<int> v2;
        v1.push_back(2);
        v2.push_back(243);
        v2.push_back(90120);
        v2.push_back(329013);
        v2.push_back(-530);

        auto result = IntSetUnion(v1.begin(), v1.end(), v2.begin(), v2.end());

        for (auto i = result.begin(); i != result.end(); ++i)
            std::cout << *i << std::endl;

    }

    // Benchmark...
    {

        const auto count = 10000000;

        std::vector<int> v1(count);
        std::vector<int> v2(count);

        for (int i = 0; i != count; ++i) {
            v1[i] = i;
            v2[i] = i - count / 2;
        }

        std::random_shuffle(v1.begin(), v1.end());
        std::random_shuffle(v2.begin(), v2.end());

        auto start_time = clock();
        auto result = IntSetUnion(v1.begin(), v1.end(), v2.begin(), v2.end());
        auto end_time = clock();
        std::cout << "Time: " << (((double)end_time - start_time) / CLOCKS_PER_SEC) << std::endl;
        std::cout << "Union element count: " << result.size() << std::endl;

    }

}

This prints...

Time: 0.402

...on my machine.

If you want to get your input ints from something else other than std::vector<int>, you can implement your own iterator and pass it to IntSetUnion.

Branko Dimitrijevic
  • 50,809
  • 10
  • 93
  • 167
0

You're sorting integers with a limited range, this is one of those rare circumstances where a radix sort could be used. Unfortunately the only way to know if this beats the generalized sort is to try it.

Mark Ransom
  • 299,747
  • 42
  • 398
  • 622