1

I have a Managed VS2010 C++ project using toolset V10. What I couldn't figure out is, if I compile my project using Debug Configuration, the hash_map destructor is exceptionally slow. hash_map.clear() is slightly faster, but, just as painful.

Note: this cannot be reproduced on VS2015 + Win10. Most likely a VS2010 issue.

I look around the web, but, nothing explains what I am getting.

1) I checked the _NO_DEBUG_HEAP=1 environment setting. This doesn't work for me because I wasn't debugging through VS. I simply compile the code in debug configuration and run it without the debugger.

2) This is not about insertion a lot of data. The insertion is fine. This is about simply clearing the data from the hash_map.

3) I thought if I turn off C++ Exception under C++ Code Generation setting, I can fix the problem, which is not the case as well.

If I compile the code in Release Configuration, the destruction is instant. If I compile the code in Debug Configuration, the destruction is like 5 minutes or longer depends one how big the data is.

I am sure it is just some kind of C++ project setting I need to correct. Anyone know how to fix this?

To recap, my project is VS2010 Managed C++ (Mixed between C++ and Managed C# objects), toolset is v10. When I compile the code using Debug Configuration, it took 5 minutes to destroy the hash_map (slower than inserting the data itself). When I compile the code using Release Configuration, it is instant.

How do I fix this? Thank you.

Here is a full code on a new project I made using VS2010. It took me less than 5 seconds to insert the items. myMap.clear() takes 202 seconds. myMap destructor takes 280 seconds.

There is what I did.

1) Create new C++ Console App using VS2010...

2) Set Configuration Properties...

2.1) General > Common Language Runtime Support > No Support....

2.2) General > Character Set > Use Multi-Byte...

2.3) C/C++ > General > Common Language Runtime Support > Yes /clr...

2.4) C/C++ > General > Debug Information Format > Program Database /Zi...

2.5) C/C++ > Code Generation > Enable Minimal Rebuild > No /Gm-...

2.6) C/C++ > Code Generation > Enable C++ Exceptions > Yes with SEH /EHa...

   Use Koby code below.

More Update:

This appears to be also dependent to the hardware? There is a big insertion performance difference between 30K and 40K. The insertion stuck at 32K for awhile and quickly insert rest of the items. When that happens, the map.clear() or destructor becomes massively slow.

The initial capacity may different on different machine? I am using a VM with 4GB of RAM on 64bits Windows7. The program is 32bits in Debug configuration.

Koby code result. Running compiled exe without VS2010.
Testing with 30K items:

TestClear()
Insertion: 0.163s
Clear: 0.075s

TestDestructor()
Insertion: 0.162s
Destruction: 4.262s

Testing with 40K items:

TestClear()
Insertion: 4.552s
Clear: 197.363s

TestDestructor()
Insertion: 4.49s
I gave up since destructor is much slower in 30K result..

Testing on VS2015 and 64bit Win10

Testing with 4M items:

TestClear()
Insertion: 8.988s
Clear: 0.878s

TestDestructor()
Insertion: 9.669s
Destruction: 0.869s

Conclusion: Something is most likely wrong with VS2010. I will mark this as resolved because VS2010 is way too old. I will try upgrade the entire solution to higher VS instead. Thank you Koby.

BoBoDev
  • 847
  • 1
  • 8
  • 17
  • Please remove the C# tag from your question. – Koby Duck Oct 18 '17 at 02:51
  • OK, deleted. I have it because I am in an managed C++ project though. – BoBoDev Oct 20 '17 at 00:18
  • Thank you. Note that managed C++ implies the .NET framework, not C#. C# is a language that compiles to .NET code. That said, the C++ tag is fine for anything related to C++, even managed C++. – Koby Duck Oct 20 '17 at 16:26

1 Answers1

5

TL;DR: std::hash_map isn't the right tool for the job. It's a non-standard extension that was deprecated in MSVC years ago. Read up on std::unordered_map, it should meet your general performance requirements. Also, see this.

I'm using VS 2017 15.3.5 at the time of this writing, so my copy of the std library headers are reasonably up to date. I no longer have a copy of the VS 2010 header files, so my answer may not entirely apply to your version. If possible you should really upgrade.

First let's analyze hash_map.clear():

void clear() _NOEXCEPT
    {   // erase all
    _List.clear();
    _Init();
    }

_List is an std::list. Clearing it requires bouncing around memory destroying objects until all nodes have been cleaned up. While necessary for a linked list, it's terrible for performance. Linked lists typically result in many cache misses. This is the majority of the problem.

Now let's take a look at _Init():

void _Init(size_type _Buckets = _Min_buckets)
    {   // initialize hash table with _Buckets buckets, leave list alone
    _Vec.reserve(2 * _Buckets); // avoid curdling _Vec if exception occurs
    _Vec.assign(2 * _Buckets, _Unchecked_end());
    _Mask = _Buckets - 1;
    _Maxidx = _Buckets;
    }

_Vec is a std::vector of iterators._Min_buckets is 8 in my copy, and reserve() only grows, never shrinks, so we can assume that a reallocation is not occurring most of the time(from clear at least). However, through calls it performs several branches, a potentially large number of destructor calls, and 8 iterator assignments. This may not seem like much but can add up quickly.

There's not much you can do about this without writing your own hash map or using an alternate implementation. Fortunately the standard provided us with std::unordered_map.

Update:
I ran your example in VS 2015 and VS 2017, changing all the settings you mentioned in an attempt to reproduce the slow scenario. In all cases it completed quickly, never more than a second.

Your example doesn't measure the clear() or destruction time independently. This is a modified version that does targeting C++/CLI(the sucessor to managed C++). If it doesn't compile for VS 2010, modify the signature of main.

#include "stdafx.h"
#define _SILENCE_STDEXT_HASH_DEPRECATION_WARNINGS
#include <hash_map>
#include <iostream>
#include <sstream>
#include <time.h>

using namespace System;

namespace Test
{
    typedef stdext::hash_map<unsigned int, float> Collection;
    typedef std::pair<unsigned int, float> Pair;

    const int Iterations = 40000;
    const int Multiple = 1000;
    const float Increment = 0.004;

    clock_t startTime = 0;

    std::string ToDisplayString(double count)
    {
        const double Million = 1000000;
        const double Thousand = 1000;

        std::stringstream stream;

        if (count > Million) {
            stream << (count / Million) << "M";
        } else if (count > Thousand) {
            stream << (count / Thousand) << "K";
        } else {
            stream << count;
        }
        return stream.str();
    }

    void ResetClock()
    {
        startTime = clock();
    }
    double GetElapsedSeconds()
    {
        return (clock() - startTime) / (double)CLOCKS_PER_SEC;
    }

    void ClearSample()
    {
        std::cout << "TestClear()\n";

        Collection map;
        double duration;
        ResetClock();

        for (int i = 0; i < Iterations; i++) {
            if (i % Multiple == 0) {
                if (i != 0) {
                    std::cout << ' ';
                }
                std::cout << i;
            }
            map.insert(Pair(i, i + Increment));
        }

        duration = GetElapsedSeconds();
        std::cout << "\nInsertion: " << duration << "s";

        ResetClock();
        map.clear();
        duration = GetElapsedSeconds();
        std::cout << "\nClear: " << duration << "s";
    }

    void DestructorSample()
    {
        std::cout << "TestDestructor()\n";

        double duration;
        {
            // Moved to a block so we can time destruction.
            Collection map;
            ResetClock();

            for (int i = 0; i < Iterations; i++) {
                if (i % Multiple == 0) {
                    if (i != 0) {
                        std::cout << ' ';
                    }
                    std::cout << i;
                }

                map.insert(Pair(i, i + Increment));
            }

            duration = GetElapsedSeconds();
            std::cout << "\nInsertion: " << duration << "s";
            ResetClock();
        }
        duration = GetElapsedSeconds();
        std::cout << "\nDestruction: " << duration << "s";
    }
}

int main(array<String^>^ args)
{
    std::cout << "Testing with " << Test::ToDisplayString(Test::Iterations) << " items:\n\n";

    Test::ClearSample();
    std::cout << "\n\n";
    Test::DestructorSample();

    std::cin.get();
    return 0;
}

These are average times I measured over many runs:
ClearSample()
Insertion: 0.45s
Clear: 0.021s

DestructorSample()
Insertion: 0.4s
Destruction: 0.013s

Koby Duck
  • 1,118
  • 7
  • 15
  • Unfortunately this doesn't work for me. I have a unordered_map with 40,000 items. And it is taking a long time to clear(). It has been 3 minutes already and still not finished. – BoBoDev Oct 20 '17 at 00:50
  • I ran roughly the same test with 40k, 50k, and 60k items, and each was under half a second. This was done on an Intel i7 4700U(a laptop CPU). I think the problem lies elsewhere. Make sure you're profiling correctly. – Koby Duck Oct 20 '17 at 03:06
  • Please post a minimal example demonstrating the problem. – Koby Duck Oct 20 '17 at 06:29
  • Hello Koby, I have updated the question with everything I did to reproduce it in a new project. FYI, I remove the symbol around the using code because it didn't show up in the webpage. – BoBoDev Oct 28 '17 at 01:10
  • Dear Koby, I updated my original question and using the code from you on my side. I realize when I lower the iterations down to 30K, it is vastly faster in insertion and clear(). When I reach 32K insertion mark, there is a big pause and quickly add rest of items for 40K case. And when this happens, the clear() is vastly slower. I also tried unordered_map, which has the same problem. – BoBoDev Oct 30 '17 at 19:09
  • Dear Koby, thanks for the help. Seems to me it is most likely a VS2010 issue because it is blazing fast on 2015. I used 4M iterations since this VM uses 6GB of ram instead 4GB, still blazing fast. Both Win7 and VS2010 are way too old anyway. Not sure how to mark this as answered, so, I just upvoted your post. Thank you for looking into this for me. – BoBoDev Oct 30 '17 at 19:38
  • Glad I could help. I too noticed fairly large performance differences between VS 2010 and later versions, even 2013. That's one of the many reasons I upgraded. I suspect it's a combination of many compiler and library optimizations. See [this post](https://meta.stackexchange.com/a/5235/372676) on how to mark answers. Note that you're under no obligation to mark this as the answer, but I would appreciate it. – Koby Duck Oct 30 '17 at 19:45
  • Haha, the check mark is so big, I didn't see it. Thanks for the instruction and more info on VS performance difference =) – BoBoDev Oct 30 '17 at 19:49