4

I'm building a didactic compiler, and I'd like to check if the function will always return a value. I intend to do this in the semantic analysis step (as this is not covered by the language grammar).

Out of all the flow control statements, this didactic language only has if, else, and while statements (so no do while, for, switch cases, etc). Note that else if is also possible. The following are all valid example snippets:

a)

if (condition) {
    // non-returning commands
}
return value

b)

if (condition) {
    return value
}
return anotherValue

c)

if (condition) {
    return value1
} else {
    return value2
}
// No return value needed here

I've searched a lot about this but couldn't find a pseudoalgorithm that I could comprehend. I've searched for software path testing, the white box testing, and also other related Stack Overflow questions like this and this.

I've heard that this can be solved using graphs, and also using a stack, but I have no idea how to implement those strategies.

Any help with pseudocode would be very helpful!

(and if it matters, I'm implementing my compiler in Swift)

Roger Oba
  • 1,292
  • 14
  • 28

2 Answers2

1

If you have a control flow graph, checking that a function always returns is as easy as checking that the implicit return at the end of the function is unreachable. So since there are plenty of analyses and optimizations where you'll want a CFG, it would not be a bad idea to construct one.

That said, even without a control flow graph, this property is pretty straight forward to check assuming some common restrictions (specifically that you're okay with something like if(cond) return x; if(!cond) return y; being seen as falling of the end even though it's equivalent to if(cond) return x; else return y;, which would be allowed). I also assume there's no goto because you didn't list it in your list of control flow statements (I make no assumptions about break and continue because those only appear within loops and loops don't matter).

We just need to consider the cases of what a legal block (i.e. one that always reaches a return) would look like:

So an empty block would clearly not be allowed because it can't reach a return if it's empty. A block that directly (i.e. not inside an if or loop) contains a return would be allowed (and if it isn't at the end of the block, everything after the return in the block would be unreachable, which you might also want to turn into an error or warning).

Loops don't matter. That is, if your block contains a loop, it still has to have a return outside of the loop even if the loop contains a return because the loop condition may be false, so there's no need for us to even check what's inside the loop. This wouldn't be true for do-while loops, but you don't have those.

If the block directly contains an if with an else and both the then-block and the else-block always reach a return, this block also always reaches a return. In that case, everything after the if-else is unreachable. Otherwise the if doesn't matter just like loops.

So in pseudo code that would be:

alwaysReturns( {} ) = false
alwaysReturns( {return exp; ...rest} ) = true
alwaysReturns( { if(exp) thenBlock else elseBlock; ...rest}) =
    (alwaysReturns(thenBlock) && alwaysReturns(elseBlock)) || alwaysReturns(rest)
alwaysReturns( {otherStatement; ...rest} ) = alwaysReturns(rest)
sepp2k
  • 363,768
  • 54
  • 674
  • 675
  • That's an interesting way of thinking! 5 hours after posting this question, I figured out a solution (that I couldn't break until now so I assume it's right haha), but it's still great to see other points of view. What I would just correct in your analysis, is that the `while` loops matter if the condition is always true, e.g. `while (true) { return value }; // Dead code from now on`, do you agree? :) – Roger Oba Nov 25 '18 at 05:12
  • @RogerOba Yeah, my simplifying assumption was basically that we treat all conditions as if they could always be true or false. If you want to handle `while(true)` specially, then yes, you'll need to care whether the loop contains a return. Note that then you'll also need to care about `break` and `continue` (which I assume your language must have for `while(true)` to even make sense). – sepp2k Nov 25 '18 at 05:22
  • It actually doesn't haha and I didn't handle the `while(true)` because I can't tell if the condition is always true at compile-time (with the current implementation). – Roger Oba Nov 25 '18 at 05:27
  • @RogerOba If it's literally `while(true)`, you know that it's always true, but if it's `while(someExp)`, checking whether `someExp` is always true would be undecidable in the general case. – sepp2k Nov 25 '18 at 05:42
  • This can be actually very complicated the more complicated is the language syntax. Possibly it has to understand inner functions and closures, it has to understand enums to see whether all cases in a switch are covered. It has understand functions that never return (`fatalError`). It's not an easy job to do without actual syntax/semantic parsing. – Sulthan Nov 25 '18 at 16:24
  • @Sulthan you mean expression evaluation to check if it's always true? Yeah we can only assume constant values, so it's also only possible if the language has a distinction between constants and variables (like Swift's `let` VS `var`). Definitely all those scenarios add a lot more extra complexity to the analyzer, luckily my language only has if/else statements! Swift's `Never` would be interesting to analyze as well – Roger Oba Nov 25 '18 at 16:29
  • @Sulthan I'm not sure what you mean by "semantic parsing", but obviously this check would be performed on the AST after parsing (and if it gets complicated enough, you'll probably want to use a CFG after all). You're right that it gets more complicated, the more features the language has (though I'm not sure how closures would complicate anything, unless it's like in Ruby where the inner function can return from the outer one) - here I've just covered the features listed in the question. – sepp2k Nov 25 '18 at 16:31
0

So, after 5 hours thinking how to implement this, I came up with a decent solution (at least I haven't been able to break it so far). I actually spent most of the time browsing the web (with no luck) than actually thinking about the problem and trying to solve it on my own.

Below is my implementation (in Swift 4.2, but the syntax is fairly easy to pick up), using a graph:

final class SemanticAnalyzer {
    private var currentNode: Node!
    private var rootNode: Node!

    final class Node {
        var nodes: [Node] = []
        var returnsExplicitly = false
        let parent: Node?
        var elseNode: Node!
        var alwaysReturns: Bool { return returnsExplicitly || elseNode?.validate() == true }

        init(parent: Node?) {
            self.parent = parent
        }

        func validate() -> Bool {
            if alwaysReturns {
                return true
            } else {
                return nodes.isEmpty ? false : nodes.allSatisfy { $0.alwaysReturns }
            }
        }
    }

    /// Initializes the components of the semantic analyzer.
    func startAnalyzing() {
        rootNode = Node(parent: nil)
        currentNode = rootNode
    }

    /// Execute when an `if` statement is found.
    func handleIfStatementFound() {
        let ifNode = Node(parent: currentNode)
        let elseNode = Node(parent: currentNode)
        // Assigning is not necessary if the current node returns explicitly.
        // But assigning is not allowed if the else node always returns, so we check if the current node always returns.
        if !currentNode.alwaysReturns {
            currentNode.elseNode = elseNode
        }
        currentNode.nodes += [ ifNode, elseNode ]
        currentNode = ifNode
    }

    /// Execute when an `else` statement is found.
    func handleElseStatementFound() {
        currentNode = currentNode.elseNode
    }

    /// Execute when a branch scope is closed.
    func handleBranchClosing() {
        currentNode = currentNode.parent! // If we're in a branch, the parent node is never nil
    }

    /// Execute when a function return statement is found.
    func handleReturnStatementFound() {
        currentNode.returnsExplicitly = true
    }

    /// Determine whether the function analyzed always returns a value.
    ///
    /// - Returns: whether the root node validates.
    func validate() -> Bool {
        return rootNode.validate()
    }
}

Basically what it does is:

  1. When it finds an if statement is create 2 new nodes and point the current node to both of them (as in a binary tree node).
  2. When the else statement is found, we just switch the current node to the else node created previously in the if statement.
  3. When a branch is closed (e.g. in an if statement's } character), it switches the current node to the parent node.
  4. When it finds a function return statement, it can assume that the current node will always have a return value.

Finally, to validate a node, either the node has an explicit return value, or all of the nodes must be valid.

This works with nested if/else statements, as well as branches without return values at all.

Roger Oba
  • 1,292
  • 14
  • 28