5

I want to profile a dll plugin in C++. I have access to the source (being the author/mantainer) and can modify them (if needed for instrumentation). What I don't have is the source/symbols/etc of the host program which is calling the dll. I only have the headers needed to build the plugin. The dll is invoked upon action from the client.

What is the best way to proceed for profiling the code? It is not realistic to "wrap" an executable around the dll and it would be not useful because since in the plugin I am calling some functions from the host AND i need to profile those paths, a wrapper would skew the performance.

EDIT after Kieren Johnston's comment: Ideally I would like to hook into the loaded dll just like the debugger is able to (attaching to the running host process and placing a breakpoint somewhere in the dll as needed). Is it possible? If not, I will need to ask another question to ask why :-)

I am using the TFS edition of Visual Studio 2010.

Bonus points for providing suggestions/answers for the same task under AIX (ah, the joys of multiple environments!).

Francesco
  • 3,200
  • 1
  • 34
  • 46
  • What profiler are you using? Did you simply try running the host program? If you have the symbols for the plugin, it should still work.. – Kieren Johnstone Nov 12 '11 at 15:54
  • The profiler built-in in VS2010. I will try unwrapping the start procedure of the host program but it's not exactly easy because it requires a variety of other programs/connection. It's not a simple foo.exe... It will require me a bit of time so I thought of checking if there was some way to "hook" into the loaded dll, just like the debugger is able to. – Francesco Nov 12 '11 at 15:58
  • 1
    enable profiler for your dll in your solution, copy all program tree in your executable folder or make vs to put your dll in the program folder. set executable file name to be foo.exe and run the profiler... – joy Nov 12 '11 at 16:17
  • @neagoegab what do you mean with "copy all program tree in your executable folder"? If I understand what you mean, I should be able by "enabling the profiler" to link the dll to the name of the executable. Could you please spell out a complete answer so that I can upvote you? – Francesco Nov 12 '11 at 17:46

2 Answers2

4

This is possible albeit a little annoying.

  1. Deploy your plug-in DLL to where the host application needs it to be
  2. Launch your host application and verify that it is using your plug-in
  3. Create a new Performance Session
  4. Add the host EXE as a target in the Session from step 3
  5. Select Sampling or Instrumentation for your Session
  6. Launch the profiling session

During all this keep your plug-in solution loaded and VS should find the symbols for your plug-in automatically.

linuxuser27
  • 7,183
  • 1
  • 26
  • 22
  • This seems what I had in mind. It is important for me to be able (in your step 4) to specify as host exe the actual running process, without restarting it (specifying command line parameters or so on). I will try it as soon as possible, but if this works, yours is the answer I was looking for :) – Francesco Nov 12 '11 at 21:51
1

Not sure about VS10, but in older ones, you debug the dll by specifying the exe for running it.

Let's split the problem into two parts: 1) locating what you might call "bottlenecks", and 2) measuring the overall speedup you get by fixing each one.

(2) is easy, right? All you need is an outer timer.

That leaves (1). If you're like most people, you think that finding the "bottlenecks" cannot be done without some kind of precision timing of the parts of the program. Not so, because most of the time the things you need to fix to get the most speedup are not things you can detect that way. They are not necessarily bad algorithms, or slow functions, or hotspots. They are distributed things being done by perfectly innocent-looking well-designed code, that just happen to present huge speedup opportunity if coded in a different way.

Here's an example where a reasonably well written program had its execution time reduced from 48 seconds to 20, 17, 13, 10, 7, 4, 2.1, and finally 1.1, over 8 iterations.** That's a compound speedup factor of over 40x. The speedup factor you can get is different in every different program - some can get less, some can get more, depending on how close they are to optimal in the first place. There's no mystery of how to do this. The method was random pausing. (It's an alternative to using a profiler. Profilers measure various things, and give you various clues that may or may not be helpful, but they don't reliably tell you what the problem is.)

** The speedup factors achieved, per iteration, were 2.38, 1.18, 1.31, 1.30, 1.43, 1.75, 1.90, 1.91. Another way to put it is the percent time reduced in each iteration: 58%, 15%, 24%, 23%, 30%, 43%, 48%, 48%. I get a hard time from profiler fans because the method is so manual, but they never talk about the speedup results. (Maybe that will change.)

Community
  • 1
  • 1
Mike Dunlavey
  • 40,059
  • 14
  • 91
  • 135
  • Thanks Mike, the links are very interesting and I appreciate your Bayesian explanation (so duly upvoted). I will surely try your approach too, which has the added plus to be cross-platform applicable. – Francesco Nov 12 '11 at 22:13
  • @Francesco: Plenty of people know this method. The idea that there can be things you can fix to save time that are not localized to routines or even lines of code, but can be quickly found by taking a good look at samples, is illustrated in *[this post](http://stackoverflow.com/questions/7916985/what-is-boilerplate-code-hot-code-and-hot-spots/7923574#7923574)*. – Mike Dunlavey Nov 13 '11 at 00:00
  • @Francesco: If you're statistically inclined, the principle behind it can also be explained with beta, binomial, or negative binomial distributions, but you don't need to know that to use it. – Mike Dunlavey Nov 13 '11 at 00:16
  • thanks for the link to your answer. The statistical principle is perfectly clear to me :) I would say that the approach aims at finding a global optimization as opposed to local optimizations. A global optimization *could* be obtained by a series of local ones (if the "potential", in physicists' jargon, is smooth enough), but that is by no means assured. And all the relevant issues of finding global minima apply. – Francesco Nov 13 '11 at 09:00