There are three kinds of loops in C:
- "structured syntax" (while, for, ...)
[Watch out for GCC, which can hide statements therefore loops inside expressions using (stmt; exp) syntax!]
- ad hoc loops using goto; these interact with the structured syntax.
- recursion
To find the first kind, you have to find the structured syntax and the nesting.
Grep can certainly find the keywords (if you ignore false positives in comments and strings), but it can't find nested structures. You could of course use grep to find all the loop syntax and then simply inspect those that occurred in the same file to see if they were nested.
(If you wanted to do this without the false positives price, you could use our Source Code Search Engine, which knows the lexical syntax of C and is never confused as to when a string of characters is a keyword, a number, a string, etc.)
If you want to find those loops automatically, you pretty much need a full C parser with expanded preprocessing accomplished. (Otherwise some macro may hide a critical piece of loop syntax). Once you have a syntax tree for C, it is straightforward (although likely a bit inconvenient) to code something that clambers over the tree, detecting loop syntax nodes, and counting nesting of loops in subtrees. You can do this with any tool that will parse C and give you abstract sytnax trees. ANTLR can likely do this; I think there's a C grammar obtainable for ANTLR that handles C fairly well, but you'll have to run the preprocessor before using ANTLR.
You could also do this with our DMS Software Reengineering Toolkit with its C Front End. Our C Front End has a full preprocessor built in so it can read the code directly and expand as it parses; it also handles a relatively wide variety of dialects of C and character encodings (ever dealt with C containing Japanese text?). DMS provides an additional advantage: given a language (e.g., C) front end, you can write patterns for the that language directly using the language syntax. So we can express fragments of what we want to find easily:
pattern block_for_loop(t:expression,l:expression,i:expression, s: statements): statement
" for(\t,\l\,\i) { \s } ";
pattern statement_for_loop(t:expression,l:expression,i:expression, s: statement): statement
" for(\t,\l\,\i) \s ";
pattern block_while_loop(c:expression, s: statements): statement
" while(\c) { \s } ";
pattern statement_while_loop(c:expression): statement
" for(\c) \s ";
...
and group them together:
pattern_set syntactic_loops
{ block_for_loop,
statement_for_loop,
block_while_loop,
block_statement_loop,
...
}
Given the pattern set, DMS can scan a syntax tree and find matches to any set member, without coding any particular tree crawling machinery and without knowing a huge amount of detail about the structure of the tree. (There's a lot of node types in an AST for a real C parser!) Finding nested loops this way would be pretty straightforward: scan the tree top down for a loop (using the pattern set); any hits must be top level loops. Scan subtrees of a found loop AST node (easy when you know where the tree for the outer loop is) for additional loops; any hits must be nested loops; recurse if necessary to pick up loops with arbitrary nesting. This works for the GCC loops-with-statements stuff, too. The tree nodes are stamped with precise file/line/column information so its easy to generate a report on location.
For ad hoc loops built using goto (what, your code doesn't have any?), you need something that can produce the actual control flow graph, and then structure that graph into nested control flow regions. The point here is that a while loop that contains an unconditional goto out isn't really a loop in spite of the syntax; and an if statement whose then clause gotos back to code upstream of the if is likely really a loop. All that loop syntax stuff is really just hueristic hints you may have loop!
Those control flow regions contain the real nesting of the control flow; DMS will construct C flow graphs, and will produce those structured regions. It provides libraries to build and access that graph; this way you can get the "true" control flow based on gotos. Having found a pair of nested control flow regions, one can access the AST associated with parts of region to get location information to report.
GCC is always a lot fun due to its signficantly enhanced version of C. For instance, it has an indirect goto statement. (Regular C has this hiding under setjmp/longjmp!).
To figure out loops in the face of this, you need points-to analysis, which DMS also provides. This information is used by the nested region analysis. There are (conservative) limits to the accuracy of points-to analysis, but modulo that you get the correct nested region graph.
Recursion is harder to find. For this, you have to determine if A calls B ... calls Z calls A, where A and B and ... can be in separate compilation units. You need a global call graph, containing all the compilation units of your application. At this point, you are probably expecting me to say that DMS does that too, voila, I'm pleased to say it does. Constructing that call graph of course requires points-to anlaysis for function calls; yes, DMS does that too. With the call graph, you can find cycles in the call graph, which are likely recursion. Also with the call graph, you can find indirect nesting, e.g., loops in one function, that call a function in another compilation unit that also contains loops.
To find structures such a loops accurately you need a lot of machinery (and this will take some effort, but then C is a bitch of a language to analyze) and DMS can provide it.
If you don't care about accuracy, and you don't care about all the kinds of loops, you can use grep and manual procedures to get a semi-accurate map of just the loops hinted at by the structured loop statements.