5

I'm taking a compiler class right now and we're at the point where we have to build a CFG in order to implement optimizations. One thing I can't figure out is how many CFGs are there for a program? Every example I ever see seems to be the CGF of a simple code segment. So, if you have a program that has say three functions. Do you have a separate CFG for each function or is there one big CFG for the entire program?

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
Ian Burris
  • 6,325
  • 21
  • 59
  • 80

2 Answers2

4

Per-function CFGs are connected by callsites. If one function calls another, e.g.:

0  void foo() { /* do stuff */ }
1  void bar() { /* do stuff */ }
2
3  void baz() {
4     foo();  // Callsite for foo. Control transfers to foo, then foo returns here.
5     bar();  // Callsite for bar. Control transfers to bar, then bar returns here.
6  }

then the control graph for baz will include an edge that goes to the graph for foo. Likewise, because foo will eventually returns to baz (and to wherever else it might've been called from), there will be an edge from the end of foo's graph back to the statement after the call to foo in baz. Here, that next statement is the call to bar on line 5. At that point, there's an edge from the bar callsite to bar's CFG, and lines from exit points in bar back to the end of baz.

Basically all you need to think about is "what code executes next". That tells you where the edges should go in your control graph. A function call transfers control until the function returns, which implies an edge from the callsite to the function CFG and back again.

Note that not all full-program CFGs are connected graphs. If there is unreachable code in the program you're analyzing, then that will be its own unconnected piece of the full CFG. e.g. if you took out the call to bar in the above example, then bar would have its own graph off to the side, while baz and foo would be connected by edges.

Ghost4Man
  • 1,040
  • 1
  • 12
  • 19
Todd Gamblin
  • 58,354
  • 15
  • 89
  • 96
  • Humm... I guess one thing that confuses me is that in class the professor said that basic block should only have one in edge and at most two out edges (one to the next block the second to a label elsewhere if there is a branch). So if I connected the different graphs then the first block for a function could have more than one in edge and the last black could have more than two out edges. Any thoughts? Would you say that the 1 edge in 2 edge out statement is wrong? – Ian Burris Apr 07 '11 at 20:42
  • The way basic blocks are usually defined is so that they only have one entry point, but you can have multiple edges pointing to the entry point. You just can't jump into the *middle* of the block, because that's not how BB's are defined. For easy analysis, you want your blocks set up so that if *any* statement in the block is executed, *all* statements in the block are executed. – Todd Gamblin Apr 09 '11 at 04:29
  • You can still have more than one edge into the block, as long as it jumps to that first statement. This happens for functions, which can be called from multiple places, and for loop entries, where the block can be entered either from the code before the loop or from the branch at the end of the loop, if the loop isn't done yet. You usually call that edge from the end of the loop back to the beginning a back edge. – Todd Gamblin Apr 09 '11 at 04:30
  • As for the out-degree, you usually only have two edges out of basic blocks. If you had a three-way branch instruction, I suppose you could have more out edges, but I can't think of an architecture that has this off the top of my head. Maybe your prof is just trying to keep things simple for now. – Todd Gamblin Apr 09 '11 at 04:32
  • It's fairly easy to find examples of control flow graphs online -- just google for it. Note the number of in edges (>1) here: http://www.cs.arizona.edu/~collberg/Teaching/453/2009/Handouts/Handout-15.pdf – Todd Gamblin Apr 09 '11 at 04:34
  • Per function graphs usually will *not* include edges to the cfg of the functions being called. That would be equivalent to inlining the function. And how would you represent calls to already compiled library functions and what about recursive functions? Function calls are instead usually represented by a special node that contains the name of the function and the arguments it is being called with. – Björn Lindqvist Aug 01 '14 at 14:40
2

Well, you can construct a CFG for each function and then - if desirable for what you want to do - combine them into a complete one. Whole program CFGs can be pretty big, however, so they usually don't work well as examples.