3

How do I detect function definitions which are never getting called and delete them from the file and then save it?

Suppose I have only 1 CPP file as of now, which has a main() function and many other function definitions (function definition can also be inside main() ). If I were to write a program to parse this CPP file and check whether a function is getting called or not and delete if it is not getting called then what is(are) the way(s) to do it?

There are few ways that come to mind:

  1. I would find out line numbers of beginning and end of main(). I can do it by maintaining a stack of opening and closing braces { and }.
  2. Anything after main would be function definition. Then I can parse for function definitions. To do this I can parse it the following way:

    < string >< open paren >< comma separated string(s) for arguments >< closing paren >
    
  3. Once I have all the names of such functions as described in (2), I can make a map with its names as key and value as a bool, indicating whether a function is getting called once or not.

  4. Finally parse the file once again to check for any calls for functions with their name as in this map. The function call can be from within main or from some other function. The value for the key (i.e. the function name) could be flagged according to whether a function is getting called or not.

I feel I have complicated my logic and it could be done in a smarter way. With the above logic it would be hard to find all the corner cases (there would be many). Also, there could be function pointers to make parsing logic difficult. If that's not enough, the function pointers could be typedefed too.

How do I go about designing my program? Are a map (to maintain filenames) and stack (to maintain braces) the right data structures or is there anything else more suitable to deal with it?

Note: I am not looking for any tool to do this. Nor do I want to use any library (if it exists to make things easy).

anurag86
  • 1,635
  • 1
  • 16
  • 31
  • 2
    I think the main issue is the C++-parser. You will most probably not get it working yourself, especially if you are worrying about whether using a map or stack, which I would regard minor issues ... So all I can say is: reconsider your restriction that you do not want to use another tool or library .. – coproc Oct 19 '15 at 08:10
  • 3
    Parsing C/C++ is difficult, I really recommend to use some library (as the one provided by clang). A edge case is *overload* resolution, where you have to differentiate between `void foo(std::string)` and `void foo(int)`. In addition `namespace` is an other difficulties. – Jarod42 Oct 19 '15 at 08:10
  • 1
    Have you tried Googling for http://www.google.com/?gws_rd=ssl#q=detect+unreferenced+symbols – CiaPan Oct 19 '15 at 08:52
  • @CiaPan : I tried to look at few questions but most of the solutions are either tool oriented or finding unreferenced symbols through object file through commands. What im looking for is to build the same tool myself. Thanks for the links i will go through them :) – anurag86 Oct 19 '15 at 08:57
  • Just suggesting me the right design and the datastructure should be good enough for me to start – anurag86 Oct 19 '15 at 08:59
  • 2
    Just the functions' names isn't enough; you also need the types and you need to perform scoped name resolution and overload resolution. (Consider e.g. `void foo() { struct { int operator()() {return 0;} } foo; foo(); }`, which isn't recursive.) – molbdnilo Oct 19 '15 at 09:09
  • 1
    I would suggest starting with plain C, for which the syntax is far easier and in which a conservative estimate for the reachability should be feasible. Once that is working and you've got the basic principles down you can consider tackling full C++ (_shudder_), and I fear that the modern meta-programming features will require deep semantic evaluation. Another wrinkle is that you'll need a fully-fledged pre-processor, since it may generate identifiers, and correctly forward the environment to it. This approach will also fall over when linking against other languages, obviously. – doynax Oct 19 '15 at 09:37
  • Step one of compilation is preprocessing, start with that. – Ulrich Eckhardt Oct 19 '15 at 10:26
  • Don't try to parse C++ source code yourself unless you want to go insane. As *yet more* examples of how bizarre and confusing it can get: ADL (briefly, the arguments to a function determine in part *which namespaces* are considered for looking up the function name itself); the "most vexing parse": `X x(Y());` looks like you're initialising an `X` object called `x` with the result of a freshly constructed object of type `Y`, but according to the language, you're actually *declaring a function` called `x`. – j_random_hacker Oct 19 '15 at 15:02

4 Answers4

1

I think you should not try to build a C++ parser from scratch, becuse of other said in comments that is really hard. IMHO, you'd better start from CLang libraries, than can do the low-level parsing for you and work directly with the abstract syntax tree.

You could even use crange as an example of how to use them to produce a cross reference table.

Alternatively, you could directly use GNU global, because its gtags command directly generates definition and reference databases that you have to analyse.

IMHO those two ways would be simpler than creating a C++ parser from scratch.

Serge Ballesta
  • 143,923
  • 11
  • 122
  • 252
1
  • This is a comment, rather than an answer, but I post it here because it's too long for a comment space.

There are lots of issues you should consider. First of all, you should not assume that main() is a first function in a source file.

Even if it is, there should be some functions header declarations before the main() so that the compiler can recognize their invocation in main.

Next, function's opening and closing brace needn't be in separate lines, they also needn't be the only characters in their lines. Generally, almost whole C++ code can be put in a single line!

Furthermore, functions can differ with parameters' types while having the same name (overloading), so you can't recognize which function is called if you don't parse the whole code down to the parameters' types. And even more: you will have to perform type lists matching with standard convertions/casts, possibly considering inline constructors calls. Of course you should not forget default parameters. Google for resolving overloaded function call, for example see an outline here

Additionally, there may be chains of unused functions. For example if a() calls b() and b() calls c() and d(), but a() itself is not called, then the whole four is unused, even though there exist 'calls' to b(), c() and d().

There is also a possibility that functions are called through a pointer, in which case you may be unable to find a call. Example:

int (*testfun)(int) = whattotest ? TestFun1 : TestFun2;  // no call
int testResult = testfun(paramToTest);  // unknown function called

Finally the code can be pretty obfuscated with #define–s.

Conclusion: you'll probably have to write your own C++ compiler (except the machine code generator) to achieve your goal.

CiaPan
  • 9,381
  • 2
  • 21
  • 35
1

The simplest approach for doing it yourself I can think of is:

  • Write a minimal parser that can identify functions. It just needs to detect the start and ending line of a function.
  • Programmatically comment out the first function, save to a temp file.
  • Try to compile the file by invoking the complier.
  • Check if there are compile errors, if yes, the function is called, if not, it is unused.
  • Continue with the next function.
alain
  • 11,939
  • 2
  • 31
  • 51
  • Even though the procedure wouldnt be feasible for files having _a lot_ of functions(and would require some logic to consider the member fnctions), its nice to seeing this approach from a different angle than the usual (by parser) – anurag86 Oct 19 '15 at 11:17
0

This is a very rough idea and I doubt it's very efficient but maybe it can help you get started. First traverse the file once, picking out any function names (I'm not entirely sure how you would do this). But once you have those names, traverse the file again, looking for the function name anywhere in the file, inside main and other functions too. If you find more than 1 instance it means that the function is being called and should be kept.

Irtza.QC
  • 1,026
  • 2
  • 10
  • 23
  • This logic is already covered in my 1,2,3,4 points. But my question is more about how would i do it(parsing is a big challenge because of varying syntax)? what design or datastructre do i use? And is there much better way to solve this? – anurag86 Oct 19 '15 at 08:07