2

I am using this particular code for creating a binary tree and the code itself is working, but I seem to be lacking a critical concept to understand the code.

Node structure

typedef struct node{
  int data;
  struct node *left,*right;
}* Node;

The create function

Node create(){
    Node root = (Node) malloc(sizeof(struct node));
    int x;
    scanf("%d",&x);
    if(x==-1) //create a null node
      return NULL;
    root->data=x;
    root->left = create();
    root->right = create();
}

The call for the function is

Node root=create();

In the create function there is no pointer to the root node being passed.
Within the function, a new node is being created, allocated memory and value of the node is assigned. But nowhere in the code is the pointer to the newly created node being passed or returned.

So how does this code work?

PS: The return type is Node. I think this is relevant but I have no idea as to why it is so. It might as well as be void as no value is returned.

Max
  • 21,123
  • 5
  • 49
  • 71
arsonist_50
  • 41
  • 1
  • 6
  • 1
    This function is broken. It does not return a value as it should. – Eugene Sh. Sep 15 '17 at 19:47
  • This code works? How are you using and testing the value of root after create call – Tyler Sep 15 '17 at 19:47
  • A particular case of input. 0 1 2 -1 -1 3 4 -1 -1 5 -1 -1 6 -1 7 -1 8 9 10 -1 -1 11 -1 -1 12 -1 -1 I am using some other function to traverse and search, but those have the usual implementation – arsonist_50 Sep 15 '17 at 19:49
  • Again. this code is invoking *undefined behavior*, as the function of a non-void type does not return a value, yet the (non-existent) return value is assigned to something. – Eugene Sh. Sep 15 '17 at 19:54
  • @Tyler I can access values within the tree like one would normally do. root->left->data gives me the expected value – arsonist_50 Sep 15 '17 at 19:54
  • @arsonist_50 you can only do that inside the function create – savram Sep 15 '17 at 19:56
  • @EugeneSh. Exactly my point. How is this undefined behavior working? How is the return value being assigned? This is actually a very common implementation and I have seen a lot of websites using this. – arsonist_50 Sep 15 '17 at 19:56
  • 3
    It is not working. It is undefined. It can work *accidentally*. Give us please a link to this implementation... – Eugene Sh. Sep 15 '17 at 19:57
  • It's UB, like @EugeneSh. says. It seems to work because you are unlucky and some stuff happens to line up, (today). Having code like this 'work' is just about the worst outcome:( – Martin James Sep 15 '17 at 19:58
  • 'This is actually a very common implementation and I have seen a lot of websites using this' - please link them, so that they can be avoided in future. In fact, given your username, maybe you could burn down those sites for us:) – Martin James Sep 15 '17 at 19:59
  • 1
    Ugh - in addition to the actual problems pointed out by everyone else, node creation and input should be separate operations. Also, you're creating an unordered tree, which ... doesn't seem very useful. – John Bode Sep 15 '17 at 20:09
  • 1
    Sidenote: **Never ever** `typedef` a pointer. This will only cause trouble to the reader and makes good coding style problematic. – too honest for this site Sep 15 '17 at 20:47

4 Answers4

4

OK, first off, there is a clear and straightforward bug in this function...

Node create(){
    Node root = (Node) malloc(sizeof(struct node));
    int x;
    scanf("%d",&x);
    if(x==-1) //create a null node
      return NULL;
    root->data=x;
    root->left = create();
    root->right = create();
}

which the compiler should have pointed out to you:

test.c: In function ‘create’:
test.c:18:1: warning: control reaches end of non-void function [-Wreturn-type]

(If your compiler ate this code without any complaints at all, turn on warnings. GCC in particular is way too permissive by default; if you're using GCC you should give it the -Wall option essentially always, and probably a bunch more -Wsomething options as well.)

The bug is simply that there's a missing last line:

return root;

Now, your actual question appears to be "I know there's a bug, but why does it work anyway?" and before I get into that, I need you to acknowledge that you understand that, if it works, it only works by accident, on the computer and with the compiler you happen to be using right now. create has what we call "undefined behavior", which means that it's allowed to do anything at all, including appearing to work correctly, but also failing catastrophically.

There is a plausible explanation for why it works anyway: on many CPUs, the register used to return pointers is also a convenient scratch register, so when compiling the statement

root->right = create();

the compiler may place root in the return-value register in order to perform the memory access root->right = .... As there are no operations after that point, the generated machine code "falls off the end", without officially putting a return value in that register, but also not removing the local variable root from that register. And so it appears to work.


Incidentally, there are a bunch more bugs in this function:

Node create(){

This has one mostly-harmless semantic error and one style error; it should be written

Node create(void)
{

In C, for historical reasons, you must write (void), not (), to define a function that takes no arguments. This is technically defining a function that takes an unspecified number of arguments, which means the compiler won't complain if you call create with arguments.

In C, unlike in some other languages you may be familiar with, the opening curly brace of a function definition should always be placed on its own line.

    Node root = (Node) malloc(sizeof(struct node));

In C (unlike C++), do not cast the return value of malloc. It is not necessary, and if you don't, the compiler will remind you when you forgot to include stdlib.h and therefore your pointers are being truncated to the width of int.

    int x;
    scanf("%d",&x);

Never use scanf in production code. Also, you are not checking for input errors.

    if(x==-1) //create a null node
      return NULL;

This leaks the just-allocated node when x is equal to -1. This check should occur before the call to malloc.

Also, your indentation is inconsistent: you used 4-space indent for the first level and 2-space indent for the second level. It doesn't matter what indentation style you use, but it does very much matter that you pick a style and stick to it throughout your program.

(And don't use tabs, because programs indented with tabs don't look the same for everybody. 8-space indent is fine, but they should actually be eight spaces.)

zwol
  • 135,547
  • 38
  • 252
  • 361
  • Thanks for the detailed explanation. Yes, I knew there was some kind of bug, but I still wanted to know why it was working. I didn't know about the -Wall flag. I will be careful about compiling such kind of code in future – arsonist_50 Sep 15 '17 at 20:12
  • @arsonist_50 You accepted the answer while I was editing it, please reread. – zwol Sep 15 '17 at 20:19
  • Why should the curly brace always be on its own line? I write it like this too but I always thought it's just a matter of style. The typedef of `Node` as a pointer bothers me much more than this. – alain Sep 15 '17 at 20:24
  • @zwol i used to code previously in c++, so i guess i was still following some of its idiosyncrasies. Thanks for pointing it out. I used pass the parameter as void but I never understood why, so I stopped using it – arsonist_50 Sep 15 '17 at 20:30
  • 1
    @alain The _original_ reason was that in pre-C89 code you had to put it on its own line because the function parameter declarations went in between the (...) and the {. Nowadays, it is just that putting the curly brace on the same line as the function head looks wrong to people familiar with the language, and is inconsistent with the way existing large codebases are formatted (and if you're working on C you're probably working on an existing large codebase). – zwol Sep 15 '17 at 20:31
  • @arsonist_50 Yeah, the (void) thing is an annoying vestige of the language as it was in the 1970s and 80s - yes, really, that long ago - but we're stuck with it. In C++ it is indeed unnecessary. – zwol Sep 15 '17 at 20:33
  • @zwol seems like I should read some style guides for C, so that I can be aware of more bugs like that – arsonist_50 Sep 15 '17 at 20:35
  • Ah interesting, I wasn't aware of that. Nice answer. – alain Sep 15 '17 at 20:37
  • You advised `Node create(void) { ... }`. It's true that the external declaration ` `Node create();` declares a function that takes unspecified arguments, but are you sure that's true for definitions? I would have thought that, even today, `Node create() { ... }` defined `create()` as accepting 0 arguments. – Steve Summit Sep 15 '17 at 20:59
  • @SteveSummit Hmm, yes, you're right. C11 6.7.6.3p15 "An empty list in a function declarator that is part of a definition of that function specifies that the function has no parameters." However, experimenting with OP's code, I can add parameters to the recursive calls to `create` and no compiler I have convenient access to offers the slightest objection (see https://godbolt.org/g/ybEBPT) – zwol Sep 15 '17 at 21:13
1

The problem here is that you are omitting the return statement at the end of create. C compilers are surprisingly permissive when it comes to this and will often compile just fine (though with warnings. You do have warnings enabled don't you!?).

Most likely the function is returning some arbitrary value sitting around in a register or in some stack frame, which luckily is the address of root so everything works as expected (see Function with missing return value, behavior at runtime).

If you add return root; at the end of the function, it should continue working as before minus the undefined behavior.

Max
  • 21,123
  • 5
  • 49
  • 71
  • I do have warnings enabled but I didn't get any during compiling the code. The code does work after adding return root; at the end. So it is by luck that the code is working properly. – arsonist_50 Sep 15 '17 at 20:07
0

There is no use of create() function until it returns pointer which is pointing to dynamically allocated memory.

kadina
  • 5,042
  • 4
  • 42
  • 83
  • @EugeneSh. That is the reason I commented, it should return pointer which is pointing to dynamically allocated memory. – kadina Sep 15 '17 at 19:51
  • Ah, OK, I have interpreted the answer in some other way. Anyway, this code is invoking UB. – Eugene Sh. Sep 15 '17 at 19:52
0

Congratulations, you found a memory leak. The function is allocating resources on the heap but it doesn't pass control over to you or frees it.

savram
  • 500
  • 4
  • 18
  • But I am able to access it outside the create() function. Is that okay? – arsonist_50 Sep 15 '17 at 20:00
  • No. If you really can access the node structure like this, you are dealing with undefined behavior. And being lucky with the results, that's all – savram Sep 15 '17 at 20:04
  • @savram: more like *un*lucky, because this code is fooling the OP into believing it's working properly, until the day it decides not to. – John Bode Sep 15 '17 at 20:06