4

I have written a C program that constructs a binary search tree from an array. It goes through the following steps:

1: Sort the array with qsort().

2: Place the sorted elements of the array into a binary tree using the recursive function treeify():

2a: Take the middle element of the array (by dividing its length by 2) and place that as the content field of the tree struct (the root node of this subtree).

2b: Function then copies the left and right halves of the remaining elements into smaller arrays and calls itself for each of these arrays respectively.

2c: Return the tree via the root node.

3: Recursively traverse the tree and print its contents in indented format.

Basically, I used a divide-and-conquer paradigm to build the tree from the already-sorted array. Surprisingly (since this was my first time designing a D&C algorithm) this part went rather smoothly.

Where I really ran into trouble was in Step 3. Sometimes it works, and when it does, all of the elements are in the right order, so that part obviously works. But 90% of times I run the program, it segfaults when it gets to the first leaf node.

Here is the full program text. I've altered the printing function so that it prints the addresses of the nodes (for debugging purposes). Originally it displayed the numeric values...

#include <stdio.h>
#include <stdlib.h>

struct tree {
    int content;
    struct tree *left;
    struct tree *right;
};

struct tree *treeify( int *, size_t );
void printtree( struct tree *, int );
int comp( int *, int * );

int main( int argc, char **argv ){
    int array[] = { 5, 6, 7, 2, 3, 4, 9, 1, 8, 0 };
    /* Sort array */
    qsort( (void *) array, 10, sizeof( int ), (int (*)(const void *, const void *)) &comp );
    for( int i = 0; i < 10; i++ ){
        printf( "%d ", array[i] );
    }
    printf( "\n" );
    /* Treeify array */
    struct tree *rootnode = treeify( array, 10 );
    /* Print tree */
    printtree( rootnode, 0 );
    return 0;
}

// Place sorted array elements in a tree
// Function is called for each subtree
struct tree *treeify( int *array, size_t size ){
    struct tree *root = (struct tree *) malloc( sizeof( struct tree ) );
    size_t middle = size/2;
    int leftsize = middle, rightsize = size-middle-1;
    int left[leftsize], right[rightsize];
    for( int i = 0; i < leftsize; i++ ) left[i] = array[i];
    for( int i = 0; i < rightsize; i++ ) right[i] = array[i+middle+1];
    root->content = array[middle];
    if( leftsize > 0 ) root->left = treeify( left, leftsize );
    if( rightsize > 0 ) root->right = treeify( right, rightsize );
    return root;
}

// Print tree contents in indented format
void printtree( struct tree *node, int level ){
    for( int i = 0; i < level; i++ ) printf( "  " );
    printf( "%x\n", &(node->content) );
    if( node->left ) printtree( node->left, level+1 );
    if( node->right ) printtree( node->right, level+1 );
}

// Comparison function for qsort
int comp( int *xp, int *yp ){
    int x = *xp, y = *yp;
    if( x < y ) return -1;
    if( x > y ) return 1;
    return 0;
}

I've managed to isolate the problem by printing the addresses of the nodes when traversing the tree. Here is the output from a successful run:

0 1 2 3 4 5 6 7 8 9 
cbe00000
  cbe00020
    cbe00040
      cbe00060
    cbe00080
      cbe000a0
  cbe000c0
    cbe000e0
      cbe00100
    cbe00120

And an unsuccessful run:

f04032b0
  f04032d0
    f04032f0
      f0403310
        0   
Segmentation fault: 11

Notice how the successful run only goes through three levels of the tree before returning and going back up. The unsuccessful run goes through four levels, reaching a null pointer.

Specifically, when the program gets to this line:

        if( node->left ) printtree( node->left, level+1 );

It takes the branch despite node->left evaluating to zero (as indicated by the fifth line of the output).

This is what I can't for the life of me understand. The condition is clearly evaluating to false (I've verified that), and yet the program is still taking that branch as if it evaluated to true (and only in most, not all, cases).

This has literally never happened to me before. I'll need someone who knows a lot more about C than I do to shed some light on this for me.

The only possibilities I can think of:

  • Some quirky compiler optimization

  • I made a stupid single-character error somewhere

  • My CPU is partially fried

Shoblade X
  • 363
  • 2
  • 9
  • Two behaviors means a single one known as *undefined behavior*. Try to run the code under [valgrind](http://www.valgrind.org) and check if there is any invalid read or write. – Iharob Al Asimi Mar 30 '18 at 14:47
  • 2
    When you allocate a new `tree`, you don't set `left = right = NULL`. – 001 Mar 30 '18 at 14:48
  • [don't cast malloc](http://stackoverflow.com/questions/605845/do-i-cast-the-result-of-malloc) – Barmar Mar 30 '18 at 14:49
  • Yes the solution is to simply initialize `left` and `right` to `NULL`, there is no single `NULL` in your code. – Iharob Al Asimi Mar 30 '18 at 14:53
  • @JohnnyMopp I am using two variables called `right` and two variables called `left` in that function. I figured since one is a member of a struct, the struct would serve as a sort of namespace to separate it from the other. Am I mistaken in making this assumption? – Shoblade X Mar 30 '18 at 14:53
  • @IharobAlAsimi So the pointer is taking on the value of whatever was in that memory location before, since it's a stack-dynamic variable and its contents aren't automatically zeroed out when declared. That makes sense. – Shoblade X Mar 30 '18 at 14:54
  • The problem occurs here `if( leftsize > 0 ) root->left = treeify( left, leftsize );` When `leftsize == 0` the what will `root->left` point to? Who knows? Same for `right`. – 001 Mar 30 '18 at 14:56
  • 1
    The cast in front of `&comp` is a sign that you should change the prototype for your `comp` function to match the requirement. – mch Mar 30 '18 at 14:58
  • @JohnnyMopp Yes, now it all makes sense. Another case of sloppy programming I guess. – Shoblade X Mar 30 '18 at 14:58

2 Answers2

5

The problem is that you try to read from an unintialized member of the structure, the very first time it happens is here

if (node->left) 

in printtree() function.

Uninitialized values, are kept like that and attempting to read them is undefined behavior, so that's why your program doesn't always behave the same.

You need to initialize both members, in fact it woule be better to have

struct tree *create_node(int content)
{
    struct tree *node;
    node = malloc(sizeof(*node));
    if (node == NULL)
        return NULL;
    node->content = content;
    node->left = NULL;
    node->right = NULL;
    return node;
}

You should also,

  1. Avoid casting malloc() or any function returning void * because of what's discussed here.
  2. Check that malloc() doesn't return NULL before using the pointer.
Iharob Al Asimi
  • 52,653
  • 6
  • 59
  • 97
  • Yes, that makes sense now. Also, I realized that the value I'm printing is not the value that gets evaluated by the `if` statement. I'm printing `&(node->content)` whereas the conditional is evaluating `node`. That's why it seems to take the branch even though the address is `NULL`. – Shoblade X Mar 30 '18 at 15:07
1

For starters the function comp shall be declared like

int comp( const void *, const void * );

Secondly in the function treeify either the data member left of the data member right will have indeterminate value in case when leftsize or rightsize are equal to 0.

The function can be implemented simpler without using auxiliary arrays.

struct tree * treeify( const int *array, size_t size )
{
    struct tree *node = NULL;

    if ( size )
    {
        node = malloc( sizeof( struct tree ) );
        size_t middle = size / 2;

        node->content = array[middle];

        node->left = treeify( array, middle );
        node->right = treeify( array + middle + 1, size - middle - 1 );
    }

    return node;
}

The function printtree( is wrong. For example it does not check whether the first parameter is equal to NULL.

Here is a demonstrative program

#include <stdio.h>
#include <stdlib.h>

struct tree 
{
    int content;
    struct tree *left;
    struct tree *right;
};

int comp( const void *, const void * );
struct tree * treeify( const int *, size_t );
void printtree( const struct tree *, int level );

int main(void) 
{
    int array[] = { 5, 6, 7, 2, 3, 4, 9, 1, 8, 0 };
    const size_t N = sizeof( array ) / sizeof( *array );

    qsort( array, N, sizeof( *array ), comp );

    for ( size_t i = 0; i < N; i++ ) printf( "%d ", array[i] );
    putchar( '\n' );

    struct tree *rootnode = treeify( array, N );

    printtree( rootnode, 0 );

    return 0;
}

int comp( const void *left, const void *right )
{
    int x = *( const int * )left;
    int y = *( const int * )right;

    return ( y < x ) - ( x < y );
}

struct tree * treeify( const int *array, size_t size )
{
    struct tree *node = NULL;

    if ( size )
    {
        node = malloc( sizeof( struct tree ) );
        size_t middle = size / 2;

        node->content = array[middle];

        node->left = treeify( array, middle );
        node->right = treeify( array + middle + 1, size - middle - 1 );
    }

    return node;
}

void printtree( const struct tree *node, int level )
{
    if ( node )
    {
        printf( "%*s", level, "" );
        printf( "%d\n", node->content );
        if( node->left ) printtree( node->left, level + 1 );
        if( node->right ) printtree( node->right, level + 1 );
    }
}

Its output is

0 1 2 3 4 5 6 7 8 9 
5
 2
  1
   0
  4
   3
 8
  7
   6
  9
Vlad from Moscow
  • 301,070
  • 26
  • 186
  • 335
  • Yes, that does make more sense to use pointer arithmetic rather than copying the array. For some reason I was thinking that only worked with strings that have null-terminators, but since the length of the array is given as a parameter, that should be sufficient. – Shoblade X Mar 30 '18 at 15:26
  • I like your `comp()` function better. Far more elegant than an `if` clause. – Shoblade X Mar 30 '18 at 15:39