1

I'm trying to write a sort and sweep broadphase system, and I've run into some performance problems, during the overlap reporting stage.

My pair reporting code is where the bottleneck is:

The basic Idea, is to generate a temporary list of overlap pairs for every axis, then for every pair in X, check if the pair exists in Y and Z. Some extra checks are in the pair generation to deal with the stacking cube and containment edge cases. The pair generation code is as follows:

//temporary pair generation for X axis
for (unsigned int i = 0; i < mXExtents.size()-1; i++)
{
    if (!mXExtents[i].mMax)
    {
        for (unsigned int j = i + 1; j < mXExtents.size(); j++)
        {
            if (mXExtents[j].mOwner->getID() == mXExtents[i].mOwner->getID())
            {
                break;
            }
            else
            {
                tempXPairs.push_back(new Pair(mXExtents[i].mOwner, mXExtents[j].mOwner));
            }
        }
    }
}
//temporary pair generation for Y axis
for (unsigned int i = 0; i < mYExtents.size()-1; i ++)
{
    if (!mYExtents[i].mMax)
    {
        for (unsigned int j = i + 1; j < mYExtents.size(); j++)
        {
            if (mYExtents[j].mOwner->getID() == mYExtents[i].mOwner->getID())
            {
                break;
            }
            else
            {
                tempYPairs.push_back(new Pair(mYExtents[i].mOwner, mYExtents[j].mOwner));
            }
        }
    }
}
//temporary pair generation for Z axis
for (unsigned int i = 0; i < mZExtents.size()-1; i ++)
{
    if (!mZExtents[i].mMax)
    {
        for (unsigned int j = i + 1; j < mZExtents.size(); j++)
        {
            if (mZExtents[j].mOwner->getID() == mZExtents[i].mOwner->getID())
            {
                break;
            }
            else
            {
                tempZPairs.push_back(new Pair(mZExtents[i].mOwner, mZExtents[j].mOwner));
            }
        }
    }
}

The bottleneck, found through profiling, occurs when pairs are compared via the == operator. I suspect that this is due to many such checks being carried out, rather than the overhead of the check itself.

Pair reporting code as follows:

    bool found = false;
    //now search Y & Z temp storage for matching pairs
    for (unsigned int i = 0; i < tempXPairs.size(); i++)
    {
        if (tempXPairs[i] != nullptr)
        {
            //search Y first
            for (unsigned int j = 0; j < tempYPairs.size(); j++)
            {
                if (tempYPairs[j] != nullptr)
                {
                    //match found in Y
                    if (*tempXPairs[i] == *tempYPairs[j])
                    {
                        //make a quick copy and stop searching
                        found = true;
                        delete tempYPairs[j];
                        tempYPairs[j] = nullptr;
                        break;
                    }
                }
            }
            //element in Y found
            if (found)
            {
                found = false;
                //search Z temp list for a match
                for (unsigned int j = 0; j < tempZPairs.size(); j++)
                {
                    if (tempZPairs[j] == nullptr)
                        continue;
                    //match in Z found
                    if (*tempXPairs[i] == *tempZPairs[j])
                    {
                        //if we are at this stage then we have a triple match, so an overlap on all axes.
                        //add the pair to the manager
                        mPairManager->addpair(tempXPairs[i]);
                        //delete extranious pairs
                        delete tempZPairs[j];
                        tempZPairs[j] = nullptr;
                        //clear variables 
                        tempXPairs[i] = nullptr;
                        //and end search
                        break;
                    }

                }
                //not found so get rid of all relevant pairs and move on to next in X list
                delete tempXPairs[i];
                tempXPairs[i] = nullptr;
            }
            else
            {
                delete tempXPairs[i];
                tempXPairs[i] = nullptr;
            }
        }
    }
    //finally clear temp storage
    for (unsigned int i = 0; i < tempXPairs.size(); i++)
    {
        if (tempXPairs[i] != nullptr)
        {
            delete tempXPairs[i];
        }
    }
    for (unsigned int i = 0; i < tempYPairs.size(); i++)
    {
        if (tempYPairs[i] != nullptr)
        {
            delete tempYPairs[i];
        }
    }
    for (unsigned int i = 0; i < tempZPairs.size(); i++)
    {
        if (tempZPairs[i] != nullptr)
        {
            delete tempZPairs[i];
        }
    }

The material I have read on sort and sweep/sweep and prune doesn't elaborate a fast way to deal with fast searching for duplicate pairs, or indeed, a way to search the other axes for equivelant pairs in an efficient manner. I'm clearly missing something, so I would appreciate any help that could be given.

Ian Young
  • 1,712
  • 1
  • 16
  • 33
  • 1
    Looking at your code, I'm a bit surprised you're using *delete*. You need to find a way of doing it without calling into the kernel, which will almost certainly include a lock and other things, to free memory in the middle of your inner loops. – Robinson Jan 15 '16 at 12:21
  • That's one area where things could be optimised, but it's not where the main bottleneck is. Duly noted though. – Ian Young Jan 15 '16 at 12:31
  • If you write 'new' or 'delete' in C++ code then it is code smell. Simple objects (like pairs) should typically be by-vaIue in containers, complex polymorphic objects should be made by 'std::make_shared' or 'std::make_unique'. Result is zero 'new's and 'delete's in modern C++ code. – Öö Tiib Jan 15 '16 at 12:52
  • I understand and agree with that, and as I said, it's an area for optimisation, but it doesn't address the real bottleneck, which is in the pair reporting (second) section of the code. – Ian Young Jan 15 '16 at 13:00

1 Answers1

1

The performance problem does not surprise me here, given this algorithm's apparent quadratic complexity.

It's not clear what class, or POD, is being returned by getId(). For the sake of clarity, let's say that getId() returns an IdType class, and that the mxExtents is a container of ExtentsType classes.

If IdType implements strict weak ordering (which means that it implements operator<, in addition to operator==), and you believe you will achieve better performance if the number of IdType comparisons are returned, then I would suggest creating a

 std::multimap<IdType, ExtentsType *> lookup;

Now populate lookup by making a single pass over mxExtents, inserting each value's IdType, and a pointer to the original instance of ExtentsType into the multimap. The insert operation will have logarithmic complexity, then after inserting everything, a single pass over the multimap container will make it trivial to grab all instances of ExtentsType with the same IdType. Since the original insert operations had logarithmic complexity, I would expect that the total number of comparisons, on the first pass, will be much fewer.

Of course, the second pass will have linear complexity, but this looks to me like the easiest low-hanging fruit to try first, to see if this fixes your suspected bottleneck.

Another potential variation would be to use a std::multiset instead of a std::multimap, with a custom comparator class. This will make it possible to employ some additional optimizations, based on the internal relationship between each ExtentsType and its inner IdType instance, to eliminate many internal copy/move constructions that are happening here, too. Which inherit the original algorithm's quadratic complexity (and would also be reduced accordingly, by switching to a multimap, and possibly eliminated completely, by using a custom multiset comparator).

Sam Varshavchik
  • 114,536
  • 5
  • 94
  • 148
  • Yes. It also occurs to me that there's a broadphase class in the *bullet* library that does sort and sweep. Well tested and very efficient. http://www.bulletphysics.org/mediawiki-1.5.8/index.php/Broadphase – Robinson Jan 15 '16 at 12:22
  • @Sam Varshavchik The id is an integer, which is unique to each object the Extent instance represents. The check is done so the algorithm knows to stop when it finds another Extent of the same ID, i.e. the max extent, and so, can stop generating pairs for that ID. – Ian Young Jan 15 '16 at 12:33
  • @Robinson This is for my honours project, so I can't use an external library for something I'm supposed to write myself. – Ian Young Jan 15 '16 at 12:34
  • 1
    You don't have to use it but you can study it to see how it's done. – Robinson Jan 15 '16 at 12:40
  • A fair point. I'd check it out but the link on the page you linked is empty. – Ian Young Jan 15 '16 at 12:41
  • Strange. The URL links to the bullet wiki's *broadphase* page. Try google. https://www.google.co.uk/search?q=wiki+broadphase+bullet&ie=&oe= – Robinson Jan 15 '16 at 13:59
  • std::multimap is not an "etxernal library". It's part of the standard C++ library. Nothing wrong with looking and studying a similar algorithm, from another library, but unless you are prohibited from using the standard C++ library, borrowing std::multimap's logarithmic complexity should be sufficient. – Sam Varshavchik Jan 15 '16 at 19:48