4

Consider this code (quoted from geeksforgeeks.org, author Tushar Roy) which calculates true or false if the path from a root to a leaf has keys that sum to a specified value:

bool hasPathSum(struct node* node, int sum)
{
  /* return true if we run out of tree and sum==0 */
  if (node == NULL)
  {
     return (sum == 0);
  }

  else
  {
    bool ans = 0; 

    /* otherwise check both subtrees */
    int subSum = sum - node->data;

    /* If we reach a leaf node and sum becomes 0 then return true*/
    if ( subSum == 0 && node->left == NULL && node->right == NULL )
      return 1;

    if(node->left)
      ans = ans || hasPathSum(node->left, subSum);
    if(node->right)
      ans = ans || hasPathSum(node->right, subSum);

    return ans;
  }
}

In this code, the author has used the logical OR operator in his assignments to a variable ans which he returns to avoid overwriting a return of true with a return of false. I have refactored the code into:

int hasPathSum(Treelink tree, int sum){
    if ( !tree ) { //succesful path found
        return (sum == 0);
    } else {
        sum -= tree->item;
        if ( (sum == 0) && (!tree->left && !tree->right)) 
            return 1;
        return hasPathSum(tree->left, sum) || hasPathSum(tree->right, sum);
    } 
}

Although it is obvious with this case that using a temporary variable and/or the logical OR operator are effective in preventing overwrites of recursive returns, what are some best methods for carrying a value up a recursive call?

EDIT

Conversely, let us consider a slightly contrived example; Given an improperly ordered binary tree (say, key 100 as root with left child's key as 5 and right child's key as 2), find the minimum key from the tree recursively. I would naively approach this as so:

Disclaimer: written but not compiled.

int findMin(Tree tree, int min) {
    if (!tree) return 0; //defensive check

    if ( tree->item < min && (tree->left || tree->right) ) { 
        //Can go deeper and reestablish the min in both directions
        if ( tree->left ) 
            min = findMin(tree->left, tree->item);
        if ( tree->right ) 
            min = findMin(tree->right, tree->item);
    } else if ( tree->item >= min && (tree->left || tree->right ) ) {
        //can go deeper but should not reestablish the min
        if ( tree->left ) 
            min = findMin(tree->left, min);
        if ( tree->right ) 
            min = findMin(tree->right, min);
    } else {
        return min; //return min's final value
    }
}

It would presumably be called via findMin(testTree, testTree->item). Fundamentally; we are forced to go down both branches from the root to explore and find a key that might be anywhere in the tree (even the root itself!) We know we have to make assignments to 'pull up' the right key, however this would most probably overwrite the 'true' min of the previous call (the way this is currently written) unless we did some kind of bounded check on each assignment e.g.

tmpmin = findMin(tree->left);
min = (min < tmpmin) ? min : tmpmin;  

We could also perhaps have two separate variables (as this is a binary tree, after all) that specify the left min and the right min and merely return the min of the two. In these cases I am still using some kind of temporary variables to piggy back to its original caller. Ultimately, I am curious about generic heuristics regarding avoiding these types of mistakes when designing recursive algorithms.

rjs
  • 838
  • 2
  • 10
  • 21
  • Your use of the term "overwriting returns" is very confusing. You are not "overwriting" anything, you have a function which returns `true` if any of the recursive calls made by it returned `true` and returns `false` otherwise. – Fernando Jan 14 '15 at 05:14
  • Your "refactored" version is different; it will call `hasPathSum(NULL` in the case that exactly one of `left` and `right` is NULL; whereas the original does not. – M.M Jan 14 '15 at 05:27
  • @Fernando yeah, I'm trying to think of a better term but I'm at a loss at the moment. I realized writing recursive solutions to problems that sometimes they won't work because I'll "clobber results" unless I do some kind of check and have that ensure the right result gets pulled back. – rjs Jan 14 '15 at 05:42
  • 1
    @MattMcNabb actually, the refactored version should give the same results as the original, and it doesn't matter if he calls `hasPathSum(NULL)` because he handles it properly. He could even remove `if ( (sum == 0) && (!tree->left && !tree->right)) return 1;` and still get the same results. – ecraig12345 Jan 14 '15 at 07:07
  • The question for heuristics is somewhat general... I hope I gave you some idea how to structure recursive traversal functions at least. The original example code was fairly ugly to be honest and obscured what happened. – phoku Jan 14 '15 at 10:17
  • 1
    It should be worth noting that attempting to "preserve" values in a recursive call, is no different than if you treat it as a single function call. You evaluate the results of some set of calculations and return the one that is "correct". If you're returning to/from a recursive call, it will just work if your algorithm is designed correctly. A non BST can still be evaluated for a min and any well-designed algorithm that can't rely on the definition of a binary search tree (unordered) will use the standard divide and conquer approach to find min values in both children, and return the lesser. – Ryan J Jan 14 '15 at 17:27

2 Answers2

3

Here is the original example without optimization. You'll see that the algorithm is much cleaner now.

bool hasPathSum(struct node* node, int sum)
{
  /* return true if we run out of tree and sum==0 */
  if (node == NULL)
     return (sum == 0);

  /* otherwise check both subtrees */
  int subSum = sum - node->data;
  return hasPathSum(node->left, subSum) ||  hasPathSum(node->right, subSum);
}

With regards to the actual question - how to return values in this sort of recursion where there's no constraint on the data in the data structure. Returning the current minimum for the sub-tree (or whatever data structure you actually look at) works just fine.

Simply consider the problem without regarding the actual data structure.

current = useful_start_value
for value in data:
    if binaryOp(current, value):
       current = value   

So for every value you have to test it against the current data. For the findMin function this results in the following code. I've adapted to use the original data structures.

int findMin(struct node* node, int min) {
    if (!node) return min;
    if (node->data < min)
       min = node->data;

    int leftMin = findMin(node->left, min);
    if(leftMin < min)
       min = leftMin;

    int rightMin = findMin(node->right, min);
    if(rightMin < min)
       min = rightMin;

    return min;
}

As a side - written like this you can also see the actual order the nodes are visited cleanly.

My general advice would be:

  • Don't optimize early (and badly)
  • Do the recursive calls at exactly one position in the source code (if you can).
  • Try to treat the problem as a local problem - as in findMin we just had to find the minimum of three values - so we did that.

I've uploaded a complete working example with additional functions for testing and generating trees for testing.

phoku
  • 2,082
  • 1
  • 18
  • 16
  • Thanks so much, these are great points. Do you think I should modify the original title to reflect this as something more about preserving values and (in that, as Ryan J pointed out above) more succinct recursive algorithm design? – rjs Jan 14 '15 at 20:53
  • Also, just tried to compile and I think you left out time.h for when you call srand (when compiled with `gcc -Wall` throws up a warning of implicit declaration of time). I like how you display the findMin function works with a potentially incorrect call. – rjs Jan 14 '15 at 21:46
  • @RJS Glad I could help you :) I'll add the missing time.h a bit later. As for renaming, that might actually be a good idea, though "succinct recursive algorithm design" sounds a bit grandiose/broad. – phoku Jan 15 '15 at 07:51
  • I'd second "try to treat the problem as a local problem" as a good way to think about recursion. – ecraig12345 Jan 15 '15 at 22:18
2

EDIT: I misunderstood the original question, so phoku's answer is much more relevant.


The logical or operator in C (and most other languages) does what's called short-circuiting: it stops evaluating operands as soon as it finds one that's true. So if hasPathSum(tree->left, sum) returns 1, hasPathSum(tree->right, sum) will never get called.

This question has a more detailed explanation. It's written for Java, but the operators in C work the same way.

Community
  • 1
  • 1
ecraig12345
  • 2,328
  • 1
  • 19
  • 26
  • 1
    The logic will only short-circuit if the left side returns true (non-0). If the left side returns false (0), it will go on to evaluate the right side of the tree. – ecraig12345 Jan 14 '15 at 05:26
  • I guess I'm not understanding what you mean by "overwriting." The function is essentially answering a yes or no question, so once you know "yes, a path to a leaf with the given sum exists," why would you need any more information? If you were answering a question that doesn't have a yes/no answer, like "what is the largest sum on any path?" or "what is the path with the largest sum?" then wanting to store or return more information would make sense. – ecraig12345 Jan 14 '15 at 05:54
  • It does seem like you have a valid question, I'm just not quite sure yet what it is (and I did misunderstand it initially, sorry). I might try to say more later about what I think you might be getting at, but it would help if you could find/invent another example that illustrates your concern more clearly. – ecraig12345 Jan 14 '15 at 07:34
  • I've tried including an example with pseudocode that might help my question's clarity. Let me know if its still muddy and I'll try to work finessing the question more. I'm mostly just curious about avoiding pitfalls in recursive designs when having to bring values up from (potentially) lots of calls. – rjs Jan 14 '15 at 08:19