12

I need to performance analyse some real-time code on Windows.

I can't use any of the regular profilers (Vtune,Codeanalyst) because I can't slow the executable down. So I'm using my own timing class based on QueryPerformanceCounter().

Are then any existing (free) C++ libs to log the results - BUT I need them to be buffered to the end of a run, I can't afford the time to write to a file or event log while I'm collecting data.

Easy enough to roll my own, but if there is a way of making log4cplus / qDebug or similar log to a buffer and dump later it would save some effort.

Martin Beckett
  • 94,801
  • 28
  • 188
  • 263
  • 1
    Note that the act of logging is still going to slow your executable down by trashing the cache and consuming memory bandwidth. Also, IIRC, QueryPerformanceCounter() is substantially slower than directly calling `RDTSC`. – EmeryBerger Jan 18 '11 at 17:39
  • 2
    Also, VTune on Windows can do event-based sampling, which will not slow your code down *at all*. – EmeryBerger Jan 18 '11 at 17:40
  • 1
    Or you can try xperf and ETW. In your case, since you have VTune, EmeryBerger already have the solution commented above. – Peon the Great Jan 18 '11 at 17:56
  • Vtune can't tell you the time between getting a network event and you finishing processing the result. Xperf etc tell me that the machine is spending lots of time in kernal functions but not why the FPS is low. AMD can't do event based on an Intel CPU and VTUNE is pricey – Martin Beckett Jan 18 '11 at 18:13
  • @EmeryBerger - yes you can't use it to time an integer add. But at the ms level to know if some call is blocking, or if a task is taking longer than I think it's fine. – Martin Beckett Jan 18 '11 at 18:15
  • Measuring is one thing, but if you want to see where changes could be made to speed it up, here's what I've done. Get the thing humming along and then kill it by hand. Record the stack(s). Do it several times. This doesn't slow it down at all, it just stops it. Any code you see on two or more samples, if you can optimize it, will give you a healthy speedup. – Mike Dunlavey Jan 18 '11 at 20:30

1 Answers1

12

I wrote a class that does this, too, because I couldn't find anything that did what I wanted but was still easy to incorporate in to existing code with copy-paste coding techniques.

(edit) Note that as mentioned in the comments, the act of timing itself changes the nature of what you're timing. But then I'm sure you understand this all too well. This class has still been very useful to me in finding hotspots & bottlenecks, in addition to giving me at least a rough high-side estimate of how long certain code takes to run. The code I present here was never intended to be production-worthy. Use it as such.

Here's a sample output of my class:

Timer Precision = 385 ns


=== Item               Trials    Ttl Time  Avg Time  Mean Time StdDev    ===
    DB Connect         1         0.11 s    0.11 s    0.11 s    0.0000 ps
    Entry Parse        3813      0.33 s    86 us     82 us     18 us
    File Load          1         50 ms     50 ms     50 ms     0.0000 ps
    Option Parse       1         5 ms      5 ms      5 ms      0.0000 ps
    Record Update      3812      0.53 s    140 us    127 us    49 us
============================================================================

Each "Item" is one section of code I wanted to time. The Timer object uses RAII (via auto_ptr) to allow automatic start & stop. Here's some sample client-side code demonstrating how to use it:

Sample Code:

int main(int ac, char* av[])
{
    std::auto_ptr<dbg::Timer> section_timer(new dbg::Timer("Option Parse"));
    // MAGIC: Parse the command line...

    section_timer = auto_ptr<dbg::Timer>(new dbg::Timer("File Load"));
    // MAGIC: Load a big file...        

    section_timer = auto_ptr<dbg::Timer>(new dbg::Timer("DB Connect"));
    // MAGIC: Establish a DB connection...

    for( /* for each item in the file*/  )
    {
      section_timer = auto_ptr<dbg::Timer>(new dbg::Timer("Entry Parse"));
      // MAGIC: Parse the incoming item
      section_timer = auto_ptr<dbg::Timer>(new dbg::Timer("Record Update"));
      // MAGIC: Update the database
    }

    // Dump the output shown above    

    cout << dbg::Timer::DumpTrials() << endl;
}

The implementation of this class is in an include library. You don't need to compile anything. I use Boost.Format to do the string formatting, but this could be replaced simply enough.

Here is the implementation (my actual implementation splits this in to two files, core.h and core_dbg.hpp; the latter is #includeed directly from the former).

Implementation:

namespace dbg
{
    class Timer
    {
    public:
        inline Timer(const std::string & name);
        inline Timer(const Timer& rhs);
        inline virtual ~Timer();
        inline std::string format_now() const;
        double current_elapsed() const;

        Timer& operator=(const std::string& name);

        static std::string DumpTrials(bool ClipboardAlso=false);            
        static void Reset();
        static std::string format_elapsed(double d) ;

    private:
        typedef std::string key_type;
        typedef double time_type;
        typedef std::multimap<key_type, time_type> times;

        static __int64 TimerFreq();

        LARGE_INTEGER startTime_, stopTime_;
        mutable LARGE_INTEGER intermediateTime_;
        std::string label_;

        static times& TimeMap();

        struct extract_key : public std::unary_function<times::value_type, key_type>
        {
            std::string operator()(times::value_type const & r) const
            {
                return r.first;
            }
        };

        struct extract_val : public std::unary_function<times::value_type, time_type>
        {
            time_type operator()(times::value_type const & r) const
            {
                return r.second;
            }
        };
        struct match_key : public std::unary_function<times::value_type, bool>
        {
            match_key(key_type const & key_) : key(key_) {};
            bool operator()(times::value_type const & rhs) const
            {
                return key == rhs.first;
            }
            match_key& operator=(const match_key& rhs)
            {
                key = rhs.key;
                return * this;
            }
        protected:
            key_type key;
        };

        struct accum_key : public std::binary_function<time_type, times::value_type, time_type>
        {
            accum_key(key_type const & key_) : key(key_), n(0) {};
            time_type operator()(time_type const & v, times::value_type const & rhs) const
            {
                if( key == rhs.first )
                {
                    ++n;
                    return rhs.second + v;
                }
                return v;
            }
        private:
            times::key_type key;
            mutable size_t n;
        };
    };

    inline Timer::Timer(const std::string & name)
    {
        label_ = name;
        QueryPerformanceCounter(&startTime_);
    }

    inline double Timer::current_elapsed() const
    {
        QueryPerformanceCounter(&intermediateTime_);
        __int64 clocks = intermediateTime_.QuadPart-startTime_.QuadPart;
        double elapsed = (double)clocks/(double)TimerFreq();
        return elapsed;
    }

    inline Timer::~Timer()
    {
        double elapsed = current_elapsed();
        stopTime_.QuadPart = intermediateTime_.QuadPart;
        TimeMap().insert(std::make_pair(label_,elapsed));
    };

    inline Timer& Timer::operator=(const std::string& name)
    {
        double elapsed = current_elapsed();
        stopTime_.QuadPart = intermediateTime_.QuadPart;
        TimeMap().insert(std::make_pair(label_,elapsed));
        label_ = name;
        QueryPerformanceCounter(&startTime_);
        return * this;
    }

    inline Timer::Timer(const Timer& rhs)
    {
        double elapsed = current_elapsed();
        stopTime_.QuadPart = intermediateTime_.QuadPart;
        TimeMap().insert(std::make_pair(label_,elapsed));
        label_ = rhs.label_;
        QueryPerformanceCounter(&startTime_);
    }

    inline std::string Timer::format_now() const
    {
        return format_elapsed(current_elapsed());
    }

    inline std::string Timer::DumpTrials(bool ClipboardAlso)
    {
        using boost::io::group;

        if( TimeMap().empty() )
        {
            return "No trials\r\n";
        }

        std::string ret = (boost::format("\r\n\r\nTimer Precision = %s\r\n\r\n")
            % format_elapsed(1.0/(double)TimerFreq())).str();

        // get a list of keys
        typedef std::set<std::string> keyset;
        keyset keys;
        std::transform(TimeMap().begin(), TimeMap().end(), std::inserter(keys, keys.begin()), extract_key());

        size_t maxrows = 0;

        typedef std::vector<std::string> strings;
        strings lines;

        static const size_t t = 9;

        std::string head = 
            (boost::format("=== %-s %-s %-s %-s %-s %-s ===") 
            //% (t*2) 
            //% (t*2) 
            % group(std::setw(t*2), std::setprecision(t*2), "Item" )
            //% t 
            //% t 
            % group(std::setw(t), std::setprecision(t), "Trials")
            //% t 
            //% t 
            % group(std::setw(t), std::setprecision(t),  "Ttl Time" )
            //% t 
            //% t 
            % group(std::setw(t), std::setprecision(t), "Avg Time" )
            //% t 
            //% t 
            % group(std::setw(t), std::setprecision(t),  "Mean Time" )
            //% t 
            //% t 
            % group(std::setw(t), std::setprecision(t),  "StdDev")
            ).str();

        ret += (boost::format("\r\n%s\r\n") % head).str();
        if( ClipboardAlso ) 
            lines.push_back("Item\tTrials\tTtl Time\tAvg Time\tMean Time\tStdDev\r\n");
        // dump the values for each key
        for( keyset::iterator key = keys.begin(); keys.end() != key; ++key )
        {
            time_type ttl = 0;
            ttl = std::accumulate(TimeMap().begin(), TimeMap().end(), ttl, accum_key(*key));
            size_t num = std::count_if( TimeMap().begin(), TimeMap().end(), match_key(*key));
            if( num > maxrows ) 
                maxrows = num;
            time_type avg = ttl / num;

            // compute mean
            std::vector<time_type> sortedTimes;
            dibcore::stl::transform_if(TimeMap().begin(), TimeMap().end(), std::inserter(sortedTimes, sortedTimes.begin()), extract_val(), match_key(*key));
            std::sort(sortedTimes.begin(), sortedTimes.end());
            size_t mid = (size_t)floor((double)num/2.0);
            double mean = ( num > 1 && (num % 2) != 0 ) ? (sortedTimes[mid]+sortedTimes[mid+1])/2.0 : sortedTimes[mid];

            // compute variance
            double sum = 0.0;
            if( num > 1 )
            {
                for( std::vector<time_type>::iterator cur = sortedTimes.begin(); sortedTimes.end() != cur; ++cur )
                    sum += pow(*cur-mean,2.0);
            }

            // compute std dev
            double stddev = num > 1 ? sqrt(sum/((double)num-1.0)) : 0.0;

            ret += (boost::format("    %-s %-s %-s %-s %-s %-s\r\n") 
                % group(std::setw(t*2), std::setprecision(t*2) , (*key))
                % group(std::setw(t), std::setprecision(t) , (boost::format("%d") %num).str() )
                % group(std::setw(t), std::setprecision(t) , format_elapsed(ttl).c_str() )
                % group(std::setw(t), std::setprecision(t) , format_elapsed(avg) )
                % group(std::setw(t), std::setprecision(t) , format_elapsed(mean) )
                % group(std::setw(t), std::setprecision(t) , format_elapsed(stddev)) ).str(); 

            if( ClipboardAlso )
                lines.push_back((
                boost::format("%s\t%s\t%s\t%s\t%s\t%s\r\n") 
                % (*key )
                % (boost::format("%d") %num)
                % format_elapsed(ttl) 
                % format_elapsed(avg) 
                % format_elapsed(mean)
                % format_elapsed(stddev)
                ).str());


        }
        ret += (boost::format("%s\r\n") 
            % std::string(head.length(),'=') ).str();

        if( ClipboardAlso )
        {
            // dump header row of data block
            lines.push_back("");
            {
                std::string s;
                for( keyset::iterator key = keys.begin(); key != keys.end(); ++key )
                {
                    if( key != keys.begin() )
                        s.append("\t");
                    s.append(*key);
                }
                s.append("\r\n");
                lines.push_back(s);
            }

            // blow out the flat map of time values to a seperate vector of times for each key
            typedef std::map<std::string, std::vector<time_type> > nodematrix;
            nodematrix nodes;
            for( times::iterator time = TimeMap().begin(); time != TimeMap().end(); ++time )
                nodes[time->first].push_back(time->second);

            // dump each data point
            for( size_t row = 0; row < maxrows; ++row )
            {
                std::string rowDump;
                for( keyset::iterator key = keys.begin(); key != keys.end(); ++key )
                {
                    if( key != keys.begin() )
                        rowDump.append("\t");
                    if( nodes[*key].size() > row )
                        rowDump.append((
                        boost::format("%f") 
                        % nodes[*key][row]
                    ).str());
                }
                rowDump.append("\r\n");
                lines.push_back(rowDump);
            }

            // dump to the clipboard
            std::string dump;
            for( strings::iterator s = lines.begin(); s != lines.end(); ++s )
            {
                dump.append(*s);
            }

            OpenClipboard(0);
            EmptyClipboard();
            HGLOBAL hg = GlobalAlloc(GMEM_MOVEABLE, dump.length()+1);
            if( hg )
            {
                char* buf = (char*)GlobalLock(hg);
                if( buf )
                {
                    std::copy(dump.begin(), dump.end(), buf);
                    buf[dump.length()] = 0;
                }
                GlobalUnlock(hg);
                SetClipboardData(CF_TEXT, hg);
            }
            CloseClipboard();
        }

        return ret;
    }

    inline std::string Timer::format_elapsed(double d) 
    {
        if( d < 0.00000001 )
        {
            // show in ps with 4 digits
            return (boost::format("%0.4f ps") % (d * 1000000000000.0)).str();
        }
        if( d < 0.00001 )
        {
            // show in ns
            return (boost::format("%0.0f ns")% (d * 1000000000.0)).str();
        }
        if( d < 0.001 )
        {
            // show in us
            return (boost::format("%0.0f us") % (d * 1000000.0)).str();
        }
        if( d < 0.1 )
        {
            // show in ms
            return (boost::format("%0.0f ms") % (d * 1000.0)).str();
        }
        if( d <= 60.0 )
        {
            // show in seconds
            return (boost::format("%0.2f s") % d).str();
        }
        if( d < 3600.0 )
        {
            // show in min:sec
            return (boost::format("%01.0f:%02.2f") % floor(d/60.0) % fmod(d,60.0)).str();
        }
        // show in h:min:sec
        return (boost::format("%01.0f:%02.0f:%02.2f") % floor(d/3600.0) % floor(fmod(d,3600.0)/60.0) % fmod(d,60.0)).str();
    }

    inline void Timer::Reset()
    {
        TimeMap().clear();
    }

    inline __int64 Timer::TimerFreq()
    {
        static __int64 freq = 0;
        static bool init = false;
        if( !init )
        {
            LARGE_INTEGER li;
            QueryPerformanceFrequency(&li);
            freq = li.QuadPart;
            init = true;
        }
        return freq;
    }

    inline Timer::times& Timer::TimeMap()
    {
        static times times_;
        return times_;
    } 
};  // namespace dbg
John Dibling
  • 99,718
  • 31
  • 186
  • 324
  • @Martin: You're welcome, I hope it helps. One quick note I forgot to mention. There's a number of things this class can do which I almost never use but thought I would. One of those is the ability to also copy the dump results to the Windows clipboard. This is the defaulted `false` parameter in `DumpTrials()` – John Dibling Jan 18 '11 at 18:24
  • Yeah, I guess we all did that. The problem with this profiling though is that due to slow operations (especially new()) that profiling code introduces, I found it's useless in profiling tight template code. Also when such heavy profile objects created, compiler stops inlining and your results will be completely off. – Gene Bushuyev Jan 18 '11 at 18:40
  • 1
    @Gene: As I said, the act of timing will itself alter the results. The intent of this device is not to tell you exactly how long a certain block of code takes to run. Such a device would be pointless, anyway, since code will take more or less time depending on the machine, other processes, the data set, etc. Even the temperature of the CPU. The intent is to give you an estimate of execution time relative to other parts of the code. For that purpose, it has served me well in all sorts of code. – John Dibling Jan 18 '11 at 18:52
  • Ye s- I simplified it to just allocate a fixed buffer of 1024 events with a time and a fixed length char[] message tag. Then a timestamp call is very cheap. – Martin Beckett Jan 20 '11 at 17:22