1

I wrote a binary search tree to store some sorted words. As is often the practice, I do this by allocating new block of memory for the binary tree every time a new word come in. But, strangely, I can only allocating new memory for the binary search tree twice, which means that at the first and second time everything was fine but the program crash at the third memory allocation.

Here is my code:

inputWord.c

/* I pass in the firstNode, and the word I wanna store, and its quantity as argument*/
int inputWord(BSTnode* Node,char* word,int num){

    BSTnode* ptr=Node;           //ptr was defined to track the location of the node.
    while(1){
        if(stricmp(word,ptr->data)>0){
                 /*If the current node already have a rightchild then ptr move to it, and do comparison again*/
            if(ptr->rightchild!=NULL){
                ptr=ptr->rightchild;
                printf("Moving to another (right) node now!!\n");
                continue;            
            }
               /*If the current node have no rightchild, then make a new one for it and store the word and its quantity*/
            else{
                ptr->rightchild=malloc(sizeof(BSTnode));
                if(!(ptr->rightchild))
                    return 1;
                ptr=ptr->rightchild;
                ptr->rightchild=NULL;
                ptr->leftchild=NULL;
                strcpy(ptr->data,word);
                ptr->num=num;
                break;
            }
        }

        else if(stricmp(word,ptr->data)<0){
                    /*it's all the same as the rightchild part*/
            if(ptr->leftchild!=NULL){
                ptr=ptr->leftchild;
                continue;
            }
            else{
                ptr->leftchild=malloc(sizeof(BSTnode));
                if(!(ptr->leftchild))
                    return 1;
                ptr=ptr->leftchild;
                ptr->leftchild=NULL;
                ptr->rightchild=NULL;
                strcpy(ptr->data,word);
                ptr->num=num;
                break;
            }
        }

            /*If the word have already been stored in the tree, print out this message*/
        else{
            fprintf(stdout,"It is exactly the same word!!\n");
            return 0;
        }
    }

    return 0;
}

I have make some necessary comments above to help you understand my intention.Hopefully that would help.

As you can see, that function was pretty straight and simple. And it did work for the first two invokation.But it crash when invoked the third time!!(always the third time).

So I made some test. And now I am pretty sure that it crash at the line

ptr->leftchild=malloc(sizeof(BSTnode));

(make it clear that the data offirstNode is initialize with "" for comparison. And I pass in the word "The" first and "Project" second and "Gutenberg" third. And the structure of BSTnode is

typedef struct BSTnode{
    char data[20];
    struct BSTnode* leftchild;   
    struct BSTnode* rightchild;  
    int num;

}BSTnode;

)

How I make that test is listed as below. (It is the same code, only with some extra print statement for test)

int inputWord(BSTnode* Node,char* word,int num){

  printf("Enter inputWord() successfully!!\n");

    BSTnode* ptr=Node;
    while(1){
        if(stricmp(word,ptr->data)>0){
            if(ptr->rightchild!=NULL){
                ptr=ptr->rightchild;
                printf("Moving to another (right) node now!!\n");
                continue;
            }
            else{
                printf("I need a new rightchild!!\n");
                ptr->rightchild=malloc(sizeof(BSTnode));
                printf("New rightchild created successfully!!\n");
                if(!(ptr->rightchild))
                    return 1;
                ptr=ptr->rightchild;
                ptr->rightchild=NULL;
                ptr->leftchild=NULL;
                printf("......In line 27 now!!\n");
                strcpy(ptr->data,word);
                printf("Copied successfully!!!..In line 29 now!!\n");
                ptr->num=num;
                fprintf(stdout,"New data '%s' successfully inserted into a new (right) node at %p (value of pointer)\n",word,ptr);
                break;
            }
        }

        else if(stricmp(word,ptr->data)<0){
            if(ptr->leftchild!=NULL){
                ptr=ptr->leftchild;
        printf("Moving to another (left) node now!!\n");
                continue;
            }
            else{
                printf("I need a new left child!!!\n");
                ptr->leftchild=malloc(sizeof(BSTnode));
                printf("New leftchild created successfully!!\n");
                if(!(ptr->leftchild))
                    return 1;
                ptr=ptr->leftchild;
                ptr->leftchild=NULL;
                ptr->rightchild=NULL;
                printf("......In line 47 now!!\n");
                strcpy(ptr->data,word);
                printf("Copied successfully!!!..In line 51 now!!\n");
                ptr->num=num;
        fprintf(stdout,"New data '%s' successfully inserted into a new (left) node at %p (value of pointer)\n",word,ptr);
                break;
            }
        }
        else{
            fprintf(stdout,"Nothing else to insert!!\n");
            return 0;
        }
    }

    return 0;
}

As you can see, with some print statements telling me where have I been, I can be sure where the program crash.

Any idea why it always crash at the third time?

#######################################################################3

main.c

#include<stdlib.h>
#include<stdio.h>
#include<string.h>
#include<stdbool.h>
#include "wordCount.h"

void prompt(BSTnode*,FILE*);
char arr[20]={0};

int main()
{
    BSTnode* firstNode=malloc(sizeof(BSTnode));
    firstNode->leftchild=NULL;
    firstNode->rightchild=NULL;
    strcpy(firstNode->data,"");
    firstNode->num=0;

    FILE* fs=fopen("testfile.txt","r");
    if(!fs){
        printf("Failed to open fiel!!\n");
        return 2;
    }

    while(1){
        if(ferror(fs))
            perror("there is a error in fs in the beginning of while loop!\n");

        prompt(firstNode,fs);
    }

        return 0;

}

void prompt(BSTnode* Node,FILE* fs){
    int i=0;     
    printf("Please select\n1.find and input a word into the binary tree\n2.print only one data\n3.Exit\n");

    if(scanf("%d",&i)!=1){
        printf("scanf failed!!\nplease input a valid number!!\n");
        //fflush(stdin);
        return;
    }
    getchar();
    switch(i){
        case 1:
            {
                memset(arr,'\0',20);        //since the "arr" is used to hold the newWord founded and returned, it should be clear first every time
                char* newWord=findWord(fs);       
                int totalNumberOfTheWord=wordCount(fs,newWord);
                inputWord(Node,newWord,totalNumberOfTheWord);                   
                break;
            }
        case 2:
            printOneNode(Node);
            break;
        case 3:
            exit(0);
        default:
            printf("Please input a valid number!(1-3)");
    }
}

Also, the wordCount.h:

#ifndef WORDCOUNT_H
#define WORDCOUNT_H
#include<stdlib.h>
#include<stdio.h>


typedef struct BSTnode{
    char data[20];
    struct BSTnode* leftchild;    //if less than, put it on the left
    struct BSTnode* rightchild;   //if greater than, on the right
    int num;

}BSTnode;

int inputWord(BSTnode*,char*,int);
char* findWord(FILE*);
int wordCount(FILE*,char*);
int printOneNode(BSTnode*);


#endif

The function prompt() is used to prompt the user to decide whether to continue word-searching.

#####################################################################3

full source code:

wordCount.c

#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <stdbool.h>
#include "wordCount.h"


int wordCount(FILE* fs,char* word)
{
      int num=0;
      rewind(fs);
        size_t n1=sizeof(word);
        size_t n2=strlen(word);
    char* buff=malloc(n1) ;        
        if(buff==NULL)
            return 1;
        memset(buff,'\0',n1);

                /* I count the word by moving byte by byte and do comparison*/      
    if (fs != NULL) {                             
        if (n2 == fread(buff, 1,n2, fs)) {       

            do {                                   
                if (strnicmp(buff,word,n2) == 0) 
                    num++;                       
                memmove(buff, buff+1,n2-1);           
            } while (1 == fread(buff+n2-1, 1, 1, fs)); 
                                     // I think I might optimize 
                                                 // this using KMP algorithm
                }

    }

        free(buff);

        return num;
}

findWord.c

#include<string.h>
#include<stdio.h>
#include<stdbool.h>
#include<stdlib.h>
#include "wordCount.h"

extern char arr[20];
char* findWord(FILE* fs)
{

      static long pos=0;
      fseek(fs,pos,SEEK_SET);

        if(ferror(fs)){
            perror("fseek() failed!!!\n");
            fprintf(stderr,"fseek() failed in file %s\n",__FILE__);
            exit(EXIT_FAILURE);
        }
        char chr[1]={0};
        bool flag1=false;
        bool flag2=false;
        while((1==fread(chr,1,1,fs))&&(!(flag1==false&&flag2==true))){
                                        // This would make the findword() function
                                        // find only a single word once
            if(chr[0]!=32){
                strncat(arr,chr,1);
                flag2=true;
                flag1=true;
            }
            else
                flag1=false;
        }

  /*the key method that I use to find a new word is that I use two 'bool' flags: flag1 and flag2. 
  *Only when the "arr" is filled only with character, not a single space, will the flag1 be false and flag2 be true, thus breaking the while loop*/ 

        pos=ftell(fs)-1;  
                          //maybe everytime you use "fseek()", "ftell()", the
                                            //file-position will move one byte ahead. 
        return arr;
    }

printOneNode.c

#include<stdio.h>
#include<string.h>
#include<stdlib.h>
#include"wordCount.h"

int printOneNode(BSTnode* Node){
    BSTnode* ptr=Node;
    while(1){
        printf("Select which side of node do you want to print now(l/r)?(q for quit) ");
        char a;
        getchar();       //this is used to consume the newline character left
        //fflush(stdin);
        if(scanf("%c",&a)!=1){
            printf("scanf failed!!");
            return 1;
        }
        switch(a){
            case 'l':
                {
                    if(ptr->leftchild!=NULL){
                        ptr=ptr->leftchild;
                        printf("\t%s\n",ptr->data);
                    }
                    else
                        printf("There is no more leftchild\n");
                    break;
                }
            case 'r':
                {
                    if(ptr->rightchild!=NULL){
                        ptr=ptr->rightchild;
                        printf("\t%s\n",ptr->data);
                    }
                    else
                        printf("There is no more rightchild!\n");
                    break;
                }
            case 'q':
                return 0;
            default:
                return 0;
        }
    }
}

The function findWord() is used to find a new word for insertion. For example, if there is string This is a lovely place... in the textfile.txt, then the findWord() would first find out a word This and then is secondly and then a thirdly, etc. (This is the reason why I define the pos as a static variable to keep track of the location.)

The function wordCount() is used to count out how many time those the word returned by findWord() appear in the testfile.txt.

The function printOneNode() is used to print out the data of one single node according to the user's willingness. I designed this function but haven't use it yet, which mean that in the prompt() function I always choose to "find and input a new word into the binary search tree"). So this may not the reason that cause my program to crash "occasionally".

As summary, my routine is:

  1. prompt the user asking whether to find and insert a new word( always yes)
  2. find a new word in the testfile.txt using findWord()
  3. count the number using wordCount()
  4. insert it into the binary search tree using inputWord()

Repeat that.

I cannot make this program smaller any more to make it more understandable, because it have to find a word and count it insert it. But you can ignore that printOneNode() function, to some extent.

As for the testfile.txt, I have posted the link below at the comment area. Thanks

walkerlala
  • 1,599
  • 1
  • 19
  • 32
  • Have you tried using `valgrind` to have some hints on what could be happening? It gives great insights, and might be easier for you to use than for us to dig in a code we can't execute. – Jerska Jun 22 '15 at 03:52
  • 1
    In your loop/function, you assume that ptr != NULL (and Node!=NULL) but never check this, it would be a good thing to check. just in case... – AndersK Jun 22 '15 at 04:09
  • @Jerska Sadlly, I am on windows, without valgrind – walkerlala Jun 22 '15 at 04:20
  • 1
    @walkerlala if the problem still persist, could you post the implementation of your code? a.k.a main – Ediac Jun 22 '15 at 04:27
  • 2
    Before `strcpy(ptr->data,word);`, check `strlen(word) < sizeof ptr->data`. – chux - Reinstate Monica Jun 22 '15 at 04:36
  • @walkerlala - it's for just this reason that I like to maintain a virtual machine with a distro of linux in it when on windows, and for VS's debugger, I keep a Win virtual machine on my linux boxes. – enhzflep Jun 22 '15 at 04:50
  • @chux Have checked that yet but still crash. It always crash at the line `ptr->leftchild=malloc(sizeof(BSTnode));` Seems that it failed to allocate memory – walkerlala Jun 22 '15 at 05:01
  • Can anyone help to set a bounty for me? I cannot set it myself ( I don't know how) – walkerlala Jun 22 '15 at 05:12
  • [A bounty can be started on a question two days after the question was asked](https://stackoverflow.com/help/bounty) – user3386109 Jun 22 '15 at 05:20
  • Occasionally, a malloc will fail when there has a memory corruption on a previously allocated block. This may be a problem to check out. Something like an unsafe copy via `strcpy(ptr->data,word);` if this copy were not setup correctly. – Michael Dorgan Jun 22 '15 at 05:27
  • @Michael Dorgan How to check that previously allocated memory? – walkerlala Jun 22 '15 at 05:31
  • Can you use strncpy or just make sure your text file doesn't have any entries more than 20 characters (or that the string is properly terminated before copy.) Any of these could cause disaster, though I don't know if it would cause a malloc failure unless it were just barely over it size. – Michael Dorgan Jun 22 '15 at 05:34
  • To check previous memory, you could examine your BST node after it was completely loaded into memory with your debugger in a memory dump view. If you see it going past the end of your memory, you have an issue. – Michael Dorgan Jun 22 '15 at 05:35
  • @Michael Dorgan Interestingly, when debugged jn gdb, this proram work out fine, with no crash. – walkerlala Jun 22 '15 at 05:37
  • 1
    Points to undefined behavior - like assuming zero'd memory in one case but not another - or an array overrun. Or a race condition where one process isn't ready but not another. I'm also willing to be if you put a simple for loop at the beginning of your program and just malloc'd 100 BST structs, it wouldn't crash, though this would eliminate a true out of memory situation. – Michael Dorgan Jun 22 '15 at 05:40
  • 4
    Please study how to create an MCVE ([How to create a Minimal, Complete, and Verifiable Example?](http://stackoverflow.com/help/mcve)) SSCCE ([Short, Self-Contained, Correct Example](http://sscce.org/)) — two names and links for the same basic idea. Your code cannot be linked because you've not included `findWord()`, `wordCount()`, `printOneNode()`. While maybe we can guess what they're supposed to look like, we can't guess whether you've made any mistakes in them. At this stage, the problem is not reproducible. We shouldn't have to struggle to reproduce your problem. – Jonathan Leffler Jun 22 '15 at 06:24
  • @JonathanLeffler sorry. I have post all the source code and would edit it in a few minutes to make it more informative – walkerlala Jun 22 '15 at 06:56
  • I've compiled it — it compiles pretty cleanly once the missing semicolon is fixed (and that's praise from me). It crashes, though. Somewhat to my surprise, my version of [`valgrind`](http://www.valgrind.org/) was not working all that well on the code. I'm not sure quite what's supposed to be in `testfile.txt`; I slapped some words, one per line, into mine. When the file was empty, I got one crash; when the file was 'full', I got various different crashes. You've not echoed the inputs as you read them; that is a valuable technique that ensures that the program sees what you expect it to see. – Jonathan Leffler Jun 22 '15 at 07:17
  • @JonathanLeffler In my own code I did make some more test than it is in my post. But those extra test statement(only a few more `print` statement for test) look like a mess, which might not be so readable for you. I would continue to add more comment in my post.Thanks – walkerlala Jun 22 '15 at 07:24
  • @JonathanLeffler As for the `testfile.txt`: https://drive.google.com/file/d/0Bwi4SUSqm7wcaC1IQV9SUjdSSDQ/view?usp=sharing It is some work of william shakespeare. But note that I have formatted this file so that it contain only normal visible character and spaces. There is not a single newline character in this file – walkerlala Jun 22 '15 at 07:33
  • If any of you have any interest in my tiny word-searching program, which would crash sometimes due to some unknown reason, and if you may, please help set a bounty. – walkerlala Jun 22 '15 at 08:05
  • 1) `size_t n1=sizeof(word);` is wrong. `sizeof(word)` is poiner size != strlen(word) – BLUEPIXY Jun 22 '15 at 09:49
  • @BLUEPIXY Oh gosh!!!! I wanna kill myself!!! Thank you so much! – walkerlala Jun 22 '15 at 10:32

1 Answers1

4

edit: This is an amendment to my previous post (found below), detailing the more severe issues found in this code.

In wordCount there is a buffer overflow. Buffer overflows are UB.

  • You're allocating n1 bytes for buff to point at. By chance, do you happen to know how many bytes that is? Perhaps you should check, and then answer this to yourself: How many bytes can you store in that object?
  • You're then attempting to read n2 bytes into buff. Which is greater, n1 or n2? Have you looked at that? What happens if you try to fit 24 eggs into a carton that only holds 12?

I think the problem here is that you don't understand the sizeof operator; it isn't a function... Rather, it is an operator much like the &address-of and the -negation operator, except that sizeof operates on the type of (or denoted by) an expression; it evaluates to the size of objects of that type.

To clarify, in the following fragment of code, n1 is sizeof (char *), which is probably not what you intended.

int wordCount(FILE* fs,char* word)
{
    int num=0;
    rewind(fs);
    size_t n1=sizeof(word);
    size_t n2=strlen(word);
    char* buff=malloc(n1);    

inputWord seems to operate under the impression that word points to a string, however that value seems to come from findWord in your program, which doesn't necessary produce a string (because it uses strncat). More undefined behaviour! Is this surprising?


Previous answer:

Firstly, this code doesn't even compile. You're missing a semicolon immediately following inputWord(Node,newWord,totalNumberOfTheWord) within prompt. Perhaps you haven't noticed the errors, and you're running an out-of-date binary which we don't have the source code for?

Secondly, even if this code were to compile, there are a number of instances of undefined behaviour such as:

  • Null pointer dereferences occur when malloc returns NULL and you attempt to modify the object which NULL points to as a result. e.g. BSTnode* firstNode=malloc(sizeof(BSTnode)); followed immediately by firstNode->leftchild=NULL;. Perhaps you could declare firstNode like so: BSTnode firstNode = { 0 }; and create a pointer to it using &firstNode... After all, you really should choose the most appropriate storage duration rather than defaulting to malloc every time. On that note, I highly recommend separating your allocation logic from your data structure logic; if you need further elaboration, consider how scanf is designed.
  • fflush(stdin);. Whenever you use a function for the first time, you should always read and understand the manual very carefully... and that's not just to provide insight on how you should be designing your functions. If you had read and fully understood this fflush manual prior to using fflush, you would have never used this problematic code. Consider using something like scanf("%*[^\n]"); getchar(); in its place.
  • In a few places you're using the %p format directive, which expects a void * pointer as a corresponding argument. The corresponding argument you're providing, however, is of type struct BSTnode *. According to the fprintf manual, "If any argument is not the correct type for the corresponding conversion specification, the behavior is undefined."

Even if you don't fix these undefined behaviours, this code may coincidentally work on your system when you provide dummy functions in place of findWord and wordCount. However, it's not required to work the same way on all systems, which means for you the crash might occur where for us it doesn't. Fix those problems.

Those problems indicate that your findWord and wordCount functions aren't necessarily trustworthy and foolproof, either; they might work for you in one setting whilst failing for you in another, or worse yet, perhaps they're stale too! You should have verified that the problem is where you think it is by providing dummy functions in their places. That is, after all, part of the process of creating an MCVE so that your question doesn't get closed.

No, I won't be interested in starting a bounty on this question because it's of extremely poor quality; as I previously mentioned, this question relies upon syntactically erroneous code compiling correctly, so we can't reproduce the result you see. Even if we fix the syntax errors, we'd have to fill in the blanks (that's your work) which introduces an aspect of uncertainty into any possible answers. About the only thing I am interested in starting for this question is the process of having it closed.

Community
  • 1
  • 1
autistic
  • 1
  • 3
  • 35
  • 80
  • I have post all the source code and would edit it in a few minutes to make it more informative. Sorry for no full source code previously.The reason why I didn't post all of that is, I fear that it's not proper to require other to read so many lines of code for me. (I once do that and receive lots of downvotes....) I would add more information to my code in a few minutes. – walkerlala Jun 22 '15 at 06:59
  • I use `fflush(stdin)` because the `scanf()` might take in the newline character left in the stdandard input. As for the `scanf("%*[^\n]")` you mentioned, I didn't use that because I am not so familiar with regular expression. I would try to use that afterward. But can you tell me what is the main risk of `fflush()`? – walkerlala Jun 22 '15 at 07:49
  • and also, the `%p` specifier is used for test, so I haven't paid so much attention to that...(I have turn on the gcc `-Wall` and `-Wextra` but no warning return). Also , when remove those statements, this program still crash sometimes. – walkerlala Jun 22 '15 at 07:55
  • The functionality of `fflush()` as [described by the Standard](http://port70.net/~nsz/c/c11/n1570.html#7.21.5.2) and as [described by MSDN](https://msdn.microsoft.com/en-us/library/9yky46tz.aspx) is different. If you want your code to be used in more than Windows, do not `fflush()` with input streams. Specifically, the Standard says `fflush()` with an input stream is Undefined Behaviour and MSDN says it "clears the contents of the buffer" (why they want that in the first place is beyond me: after all the contents of the buffer are regulated externally) – pmg Jun 22 '15 at 08:10
  • @walkerlala As I said, `fflush(stdin)` is **undefined behaviour**. Simply put, `fflush` doesn't operate on read-only streams. Did you read the link I pasted? It says "the `fflush` function causes any unwritten data for that stream to be delivered to the host environment to be written to the file"... **How does that make any sense for `stdin`?** *Flushing `stdin`* is metaphorically equivalent to pushing the flush button on your toilet and watching waste *appear* and perhaps even *start flooding the toilet*, rather than *disappear*. – autistic Jun 22 '15 at 08:39
  • @undefinedbehaviour I read that. I previosly wrongly think that I can use `fflush` to clean `stdin`. So I fixed that using `getchar()` you mentioned. But bug still exit. – walkerlala Jun 22 '15 at 08:55
  • 1
    @walkerlala It might be an exercise to read the MCVE page again and again until you *fully understand* it, as you should be doing with *many manuals*. You won't get far with C if you just keep guessing... – autistic Jun 22 '15 at 09:00
  • if operating on the same string, then the result of `strlen()` will be one byte less than that of `sizeof`, isn't it? – walkerlala Jun 22 '15 at 09:09
  • I really don't understand why the return value of `findWord()` is " not necessary a string(because it use `strncat()`). Could you elaborate on that please? – walkerlala Jun 22 '15 at 09:11
  • I think I have to set the `buff` using `memset(buff,'\0',n1)`, right? – walkerlala Jun 22 '15 at 09:15
  • really apprecite that you find out UB in the program for me – walkerlala Jun 22 '15 at 09:38
  • I know where my code go wrong. It is the `sizeof` part. – walkerlala Jun 22 '15 at 10:33
  • @walkerlala `sizeof` is an operator that tells you the size of the type (e.g. `sizeof word` in your code is the same value as `sizeof (char *)`, which is obviously nothing to do with the length of a string). – autistic Jun 22 '15 at 12:19
  • Thanks a lot. Sometimes forget. I am not so familiar with C yet.I have fixed that bug and it workout fine. Also, if you find any other potential bugs in my code, or any other UB, please let me know – walkerlala Jun 22 '15 at 12:38