3

For example, I want to insert a function call after every line. Such as:

for (int i = 0; i < n; ++i)
{
   double d = 2*i;
}

would become

for (int i = 0; i < n; ++i)
{
   myFuncCall();
   double d = 2*i;
   myFuncCall();
}
myFuncCall();

I have been researching generalized c++ parsers but they either seem to be a) commercial, b) incomplete or c) difficult to use

Compilers aren't my life and this is a means to and end so I am looking for the fastest solution

EDIT: The reason I want to do this is we are chasing a nightmare bug where code crashes in release mode but not debug mode. For reasons beyond our control, we can't compile release code with debug symbols, so we are trying to make progress with random print statements. If I could make this work, we would at least immediately know where the code crashes because the inserted statements would act like a trace.

Thanks Andrew

  • 4
    The *proper* way just *is* that hard - sorry, but that's fact. A dirty hack may be a billion times easier and work in 80% of all cases (or work even better but be not quite as easy) though, so if you know the people who'll use this know what they're doing... –  Jul 08 '11 at 20:35
  • What's the issue with existing free parsers? Need macro support or c++0x? I'm pretty sure there is a working solution in python or java... – Karoly Horvath Jul 08 '11 at 20:36
  • 2
    @yi_H: I'm pretty sure there's no existing solution in Python or Java. There are only a few robust, mature, widely-used C++ frontends, and they all are part of a major compiler (except the Edison Design Group one, which is compiler-agnostic but used in *several* compilers) and written in C or C++. C++ simply is a comparatively hard language to parse. And AFAIK only one, Clang, was designed to be easily usable as library by other applications. –  Jul 08 '11 at 20:40
  • 2
    @Andrew S.: you should tell us why you want to do this, there may exist a different solution for the problem behind the problem you described. – Doc Brown Jul 08 '11 at 20:45
  • Do you want to add something after every *statement*, or after every non-blank line? You say the former and demonstrate the latter in your sample code. – Mark B Jul 08 '11 at 20:51
  • Good point. Line resolution would be enough and I have modified the question accordingly - thanks –  Jul 08 '11 at 20:52
  • Can you at least get the PC and stack contents from the crash point, just as a hex dump? If so, sit down with your trusty linker map and the assembly output from the compiler, a strong cup of coffee and a smart colleague.. – Roddy Jul 08 '11 at 20:59
  • 2
    Is the problem really that you _cannot_ compile with debug symbols, or are you not allowed to _distribute_ a binary with debug symbols to whoever is experiencing the crash (client?)? The reason why I'm asking is, if you just can't distribute the symbols, you could keep the symbols "secret" and have the client report the crash address. Using `addr2line`, this translates into a sourcecode code position. – Damon Jul 08 '11 at 21:10
  • @roddy, @damon. We've tried lots of ways to debug this...more than the scope of this question. I think this will be of value to us if we can get it to work. –  Jul 08 '11 at 21:21
  • @delnan: Our DMS Toolkit is designed specifically to enable the construction of custom analyzers/instrumenters/transformers. See my answer. – Ira Baxter Jul 08 '11 at 21:44
  • What OS are you using? On a Windows system you can have the linker generate a map file and then use the crash address to look up the function where the crash happened. Depending on the level of detail you generate in the map file, you can sometimes narrow it down to a few lines. It gets hard because in release mode, code may be rearranged, but if you know which function crashes you have a place to start. – Ferruccio Jul 08 '11 at 23:45
  • If your program can be compiled under Linux, try running it under valgrind; that might show you the source of the bug. – Jeremy Friesner Jul 11 '11 at 05:08

2 Answers2

6

Just to ask the obvious question: can you compile release mode with a separate symbol file?

If not, I would actually suggest a manual "binary search" approach rather than putting prints on every line. The problem with so many print statements is they can both slow down your program and unintentionally change its observable behavior. The fewer you can get away with the better.

Mark B
  • 95,107
  • 10
  • 109
  • 188
  • 1
    +1 if the code is to big to insert the debug code manually, it definitely is too big, to print something on each line... maybe just at the start of the functions would be enough too – Christian Goltz Jul 08 '11 at 21:08
4

Our DMS Software Reengineering Toolkit with its C++ Front End could do this pretty easily. Yes, its commercial but I don't think there's a lot of noncommercial real solutions to your task.

DMS is a program transformation engine, that parses, analyzes, and transforms code according to a supplied langauge definition. Its C++ front end is the language definition for a variety of dialects of C++. As part of the parsing process for C++, DMS can build up compiler-accurate symbol tables. This is needed for OPs task, to distinguish otherwise ambiguous syntax which might be a statement from the alternative declarations such syntax might represent. (See Why can't C++ be parsed with a LR(1) parser? for examples of this).

The value in DMS for this task is that it allows source-to-source transformations to be applied to the abstract syntax trees produced by the parse. The following DMS rule, written in DMS rule syntax, would probably be pretty close to what OP needs:

  domain Cpp~ANSI;

  rule instrument_statements(s: executable_statement):
       executable_statement->executable_statement
  " \s " ->  " { \s ; post_statement_call(); } "

The text inside the meta quotes " ... " is target domain syntax, in this case ANSI C++. The \ is a meta escape; \s represesents any executable statement. What this rule does is match all syntactic executable_statements, and replace them by a block of two statements, the first being the original statement, the second being whatever OP wants done after each statement. I've assumed OP simply wanted to call a function, but he may want something more complex here, perhaps involving printing line numbers, function names, or function parameters [requiring some additional transformation rules].

The pattern matching and the transformation are done using parsed syntax trees, so it can't get confused by the presence of something that looks like code in a string, or a comment, or isn't actually and executable statement (e.g., is a declaration), etc. [There's a minor detail I glossed over to prevent this rule from being applied recursively to its results, but that detail is easily managed using DMS's APIs] After the transformation, the modified syntax tree is regenerated into compilable C++ source text. OP would compile and run that code instead of his original code.

Note that the post_statement need not actually print anything. If it calls a central function, OP can code whatever predicates/print statements he wants to control the amount of output/overhead that the post_statement consumes. In essence, this can act as a programmable breakpoint.

This basic idea of inserting probes by transformational methods is used in our line of COTS test coverage and profiling tools, all based directly on DMS, including our C++ Test Coverage Tool. For more details, see http://www.semdesigns.com/Company/Publications/TestCoverage.pdf

OP probably might find using our test coverage tool an easy way to accomplish something pretty close but easier to do. What the test coverage tool does is insert special trace-data capture statements at the beginning of every block of (unconditional) code, rather than after every individual statement. That trace data capture is actually a macro invocation, whose code we supply in source form as part of the test coverage product. While not the intended use, one could simply replace that macro call with the desired tracing, and the test coverage tool would in effect insert the desired code where it would have placed the probes. He can probably still capture the function name, and a unique code point [the test coverage tool manufactures these] as a stand-in for the line number. What he could not do is more sophisticated tasks that would be possible with DMS proper. For instance, there's no way the trace macro can get its hands on the original line number; that's lost by the time the macro is introduced by the test coverage tool. (With DMS proper, it isn't lost). But there is a way to convert the "unique code point" back into precise source location information.

EDIT 7/10/2011: OP might even find that running the test coverage as a test coverage tool might help him, too. If the test-coverage compiled application is executed, and doesn't crash, the "covered code" of that execution has run at least once and is therefore somewhat less likely to be the source of the problem. (No gaurantees: just becuase you executed doens't meant its correct). But the hint is the problem is someplace else; this would tend to eliminate code that isn't the problem.

Community
  • 1
  • 1
Ira Baxter
  • 93,541
  • 22
  • 172
  • 341
  • 2
    First time I've upvoted as answer that's plugging the answerers own product! Looks like a very interesting and powerful toolkit. – Roddy Jul 08 '11 at 21:59
  • I would love to make this the accepted answer, but cannot because I (and I suspect many others) don't have the funds to buy a solution for this problem –  Jul 10 '11 at 02:35
  • Downvotersm (flaggers as far as I can tell): Can you explain your objections? This directly addresses OP's request. – Ira Baxter Jul 11 '11 at 03:00
  • 2
    @downvoters (and flaggers): Answers which advertise a product are only bad if they are irrelevant or hide the poster's relationship with the product. Ira is a significant contributor on SO and AFAIS, has always done a good job of full disclosure, and usually a good job of only recommending his product when it actually does solve the problem. – Ben Voigt Jul 11 '11 at 03:12