4

I have never done profiling. Yesterday I programmed a ProfilingTimer class with a static timetable (a map<std::string, long long>) for time storage.

The constructor stores the starting tick, and the destructor calulates the elapsed time and adds it to the map:

ProfilingTimer::ProfilingTimer(std::string name)
 : mLocalNameLength(name.length())
{
 sNestedName += name;
 sNestedName += " > ";

 mStartTick = Platform::GetTimerTicks();
}

ProfilingTimer::~ProfilingTimer()
{
 long long totalTicks = Platform::GetTimerTicks() - mStartTick;

 sTimetable[sNestedName] += totalTicks;

 sNestedName.erase(sNestedName.length() - mLocalNameLength - 3);
}

In every function (or {block}) that I want to profile i need to add:

ProfilingTimer _ProfilingTimer("identifier");

This Profiling work all fine when I build a release version from Visual C++ 2010 Professional. But when I build as Debug I get a huge fps drop (from 63 down to ~20).

These are the numbers I get when I print my timetable (Debug build):

Update() > Tower::Update > : 2551 ms (84100m%)
Update() > Tower::Update > Tower::Update1 > : 1313 ms (43284m%)
Update() > Tower::Update > Tower::Update1 > Tower::FindNewTarget > : 6 ms (204m%)
Update() > Tower::Update > Tower::Update1 > Tower::HasTargetInRange > : 5 ms (184m%)
Update() > Tower::Update > Tower::Update2 > : 659 ms (21756m%)
Update() > Tower::Update > Tower::Update2 > Tower::HasTargetInRange > : 5 ms (187m%)

Update1 and Update2 times the first and second half respectively of Update. Why doesnt they add up to 84,1 %?

Still this 84% is a huge number - in release build I get this output:

Update() > : 770 ms (1549m%)
Update() > Tower::Update > : 722 ms (1452m%)
Update() > Tower::Update > Tower::FindNewTarget > : 44 ms (89m%)
Update() > Tower::Update > Tower::HasTargetInRange > : 92 ms (187m%)

1,4% instead of 84,1%. That is a huge difference!

Anyone knows why?

EDIT: I guess release is much faster than Debug, but why is this profiling so time consuming? Is std::map the time hogger or am I doing something extremely wrong?

EDIT: Updated the code. The initiation wasn't needed, and now stores the length of mLocalName instead of the actual string.

Moberg
  • 5,253
  • 4
  • 38
  • 54
  • You can find out why anything is slow by just pausing it a few times: http://stackoverflow.com/questions/375913/what-can-i-use-to-profile-c-code-in-linux/378024#378024 – Mike Dunlavey Nov 21 '10 at 01:39
  • 1
    It's a little button with two vertical bars (||). Get your code running, and while it's running, hit the button. If it runs too quickly to pause, wrap a long loop around it. When you pause it, examine the call stack. That will tell you exactly what's going on. If you see it doing the same thing (however you define "same") two or more times you pause it, you know that's taking a lot of time. If you have >1 thread, look at each one, but ignore the one that's waiting for user input. – Mike Dunlavey Nov 21 '10 at 14:19
  • If you don't mind my getting technical, suppose you pause it 3 times, and 2 of those times you see statement X on the stack, so X by itself costs about 2/3 of the time. By "about" I mean, take 4 uniform(0,1) random numbers and sort them. The distribution of the 3rd one is the same as the fraction of time X takes. You don't know what it is, but you know it's a lot. – Mike Dunlavey Nov 21 '10 at 17:18

3 Answers3

2

Microsoft adds a lot of safety checking to their container libraries in debug mode. This is beneficial. You'd rather catch out of bounds exceptions and the like in functions such as vector::operator[] instead of deciphering memory corruptions (would still recommend calling vector::at.) However, there's also a ton of other things that go into inserting debugger hooks that impact your code and it's performance.

wheaties
  • 35,646
  • 15
  • 94
  • 131
2

Using long strings as an index into a std::map is probably not the fastest way to do this. Long strings with common beginnings mean that each time you compare two of these strings, a lot of characters have to be looked at to see if the strings are equal or not. std::map is basically a binary tree and does O(log(n)) comparisons on each lookup/insert. Using shorter or numeric keys should speed up all the map operations.

Also, using std::unordered_map might (or might not) improve speed, depending on how many elements the map contains and what kind of keys are used.

In the profiling class, the constructor should take the name as a reference to avoid creating an unnecessary copy of that string:

ProfilingTimer::ProfilingTimer(const std::string &name)

Generally avoiding unnecessary copies is a good idea. For example, you maybe don't really need the mLocalName member and it could be enough to just store the string length instead.

The profiling functions are probably called very often, so small delays can add up over the run of the program.

sth
  • 222,467
  • 53
  • 283
  • 367
  • Yes too often. Instead of having in the function we moved it outside the for loops calling these funcitons. Much better perfomance! :) But I still think it need's some improvement.. Also thanks for the tip about storing the length instead of the actual string mLocalName! – Moberg Nov 21 '10 at 13:16
2

There are a couple performance issues with your code.

  1. You're needlessly constructing an std::string in your ProfilingTimer constructor. I would recommend using a const char* as your parameter and using a custom Twine/Rope structure to do your appends.
  2. Why does mLocalName even exist? Just refer to name directly.
  3. As has been previously mentioned, don't ever profile in debug mode. It is beyond useless.
  4. Maps are actually quite slow in practice. I suggest using a hashtable. Unfortunately, implementations are compiler specific. If you're using Microsoft's, I believe they have unordered_map available for your use.
  5. Instead of doing sTimetable[sNestedName] = 0;, use the iterator you already retrieved.

    Timetable::iterator loc = sTimetable.find(sNestedName);
    if(loc == sTimetable.end())
        sTimetable[sNestedName] = 0;
    

Addendum: Visual Studio comes with a profiler last I checked. Why not just use that?

sth
  • 222,467
  • 53
  • 283
  • 367
Clark Gaebel
  • 17,280
  • 20
  • 66
  • 93
  • Is there any STL:: implementation of ropes? Yea I read about that VS profiler somewhere, but I don't know how to use it, plus it's fun to do your own code ;) – Moberg Nov 21 '10 at 00:16
  • I agree that it's fun to do this stuff, that's how I know about it ;) I don't think VS has a native Rope implementation, but there is a public one that I know about: LLVM's. http://llvm.org/docs/doxygen/html/Twine_8h_source.html – Clark Gaebel Nov 21 '10 at 05:03
  • My friend tried to change all string to char*, and let the map order it's content based on the pointers (no more alphabetical order but that's OK). But it became slower! >.< Not storing the mLocalName however made it a bit faster. :) – Moberg Nov 21 '10 at 13:13