0

I am a beginner in C with some experience in python and java. I want to solve a problem with C. The problem goes like this:

Take an input as a sentence with words separated by blank spaces only (assume lower case only), re-write the sentence with the following rules:

1) If a word occurs the first time, keep it the same.

2) If the word occurs twice, replace the second occurrence with the word being copied twice (e.g. two --> twotwo).

3) If the word occurs three times or more, delete all the occurrences after the second one.

Print the output as a sentence. The maximum lengths of the input sentence and each individual word is 500 chars and 50 chars.

Exemplary input: jingle bells jingle bells jingle all the way

Exemplary output: jingle bells jinglejingle bellsbells all the way

The approach I take is:

1) Read the input, separate each word and put them into an array of char pointers.

2) Use nested for loop to go through the array. For each word after the first word:

 A - If there is no word before it that is equal to it, nothing happens.

 B - If there is already one word before it that is equal to it, change the word as its "doubled form".

 C - If there is already a "doubled form" of itself that exists before it, delete the word (set the element to NULL.

3) Print the modified array.

I am fairly confident about the correctness of this approach. However, when I actually write the code:

'''

int main()
{

    char input[500];

    char *output[500];


    // Gets the input
    printf("Enter a string: ");
    gets(input);

    // Gets the first token, put it in the array
    char *token = strtok(input, " ");
    output[0] = token;

    // Keeps getting tokens and filling the array, untill no blank space is found
    int i = 1;
    while (token != NULL) {
        token = strtok(NULL, " ");
        output[i] = token;
        i++;
    }


    // Processes the array, starting from the second element
    int j, k;
    char *doubled;
    for (j = 1; j < 500; j++) {
        strcpy(doubled, output[j]);      
        strcat(doubled, doubled);        // Create the "doubled form"
        for (k = 0; k < j; k++) {
            if (strcmp(output[k], output[j]) == 0) {     // Situation B
                output[j] = doubled;
            }
            if (strcmp(output[k], doubled) == 0) {       // Situation C
                output[j] = ' ';
            }
        }
    }


    // Convert the array to a string
    char *result = output[0];          // Initialize a string with the first element in the array     
    int l;
    char *blank_space = " ";           // The blank spaces that need to be addded into the sentence
    for (l = 1; l < 500; l++) {
        if (output[l] != '\0'){        // If there is a word that exists at the given index, add it
            strcat(result, blank_space);
            strcat(result, output[l]);

        }
        else {                         // If reaches the end of the sentence
            break;
        }
    }

    // Prints out the result string
    printf("%s", result);

    return 0;
}

'''

I did a bunch of tests on each individual block. There are several issues:

1) When processing the array, strcmp, strcat, and strcpy in the loop seem to give Segmentation fault error reports.

2) When printing the array, the words did not show the order that they are supposed to do.

I am now frustrated because it seems that the issues are all coming from some internal structural defects of my code and they are very much related to the memory mechanism of C which I am not really too familiar with. How should I fix this?

  • 4
    Never ***ever*** use `gets`! It's a [dangerous](https://stackoverflow.com/questions/1694036/why-is-the-gets-function-so-dangerous-that-it-should-not-be-used) function that even have been removed from the C language. Use e.g. [`fgets`](https://en.cppreference.com/w/c/io/fgets) instead. – Some programmer dude Nov 04 '19 at 12:01
  • 3
    As for your problem, you really need to take a few steps back, and go back to your books, tutorials or classes and read more about pointers. A pointer must point somewhere valid for you to be able to use it! You have many uninitialized (and therefore indeterminate) pointers. – Some programmer dude Nov 04 '19 at 12:02
  • Yes, I got this warning when I was trying to run the program, but using gets is actually one of the requirements of this problem so I assume I should follow it regardless. – Curious student Nov 04 '19 at 12:03
  • Then the book or tutorial that gave you this assignment or exercise should be thrown out! And if it was a teacher, you need to tell him or her about it. Unless it's an exercise in buffer overflows and the dangers about them, never ever use `gets`, even if a teacher tells you to. – Some programmer dude Nov 04 '19 at 12:05
  • Compile with `-Wall -Wextra`. They will tell you what's wrong with `doubled` – klutt Nov 04 '19 at 12:05
  • Thanks! Is it char *double you are specifically referred to? I wonder if it is fine to just do char *double = " " as an initialization because I am changing it anyway? – Curious student Nov 04 '19 at 12:06
  • You need to read about strings and dynamic memory allocation. And as the dude said. You should NEVER use `gets` unless it's an exercise about the dangers of that function. This exercise is clearly not. – klutt Nov 04 '19 at 12:12
  • Better to use `char *double = NULL;`, or even better yet to use `char *double = malloc(someSizeValueForNumberOfDoublesNeeded);` – ryyker Nov 04 '19 at 14:02

2 Answers2

0

One issues jumps out at me. This code is wrong:

char *doubled;
for (j = 1; j < 500; j++) {
    strcpy(doubled, output[j]);      
    strcat(doubled, doubled);        // Create the "doubled form"

doubled doesn't point to any actual memory. So trying to copy data to where it points is undefined behavior and will almost certainly cause a SIGSEGV - and it will corrupt memory if it doesn't cause a SIGSEGV.

That needs to be fixed - you can't copy a string with strcpy() or strcat() to an pointer that doesn't point to actual memory.

This would be better, but still not ideal as no checking is done to ensure there's no buffer overflow:

char doubled[ 2000 ];
for (j = 1; j < 500; j++) {
    strcpy(doubled, output[j]);      
    strcat(doubled, doubled);        // Create the "doubled form"

This is also a problem with doubled defined like that:

        if (strcmp(output[k], output[j]) == 0) {     // Situation B
            output[j] = doubled;
        }

That just points output[j] at doubled. The next loop iteration will overwrite doubled, and the data that output[j] still points to to will get changed.

This would address that problem:

        if (strcmp(output[k], output[j]) == 0) {     // Situation B
            output[j] = strdup( doubled );
        }

strdup() is a POSIX function that, unsurprising, duplicates a string. That string will need to be free()'d later, though, as strdup() is the same as:

char *strdup( const char *input )
{
    char *duplicate = malloc( 1 + strlen( input ) );
    strcpy( duplicate, input );
    return( duplicate );
}

As pointed out, strcat(doubled, doubled); is also a problem. One possible solution:

    memmove(doubled + strlen( doubled ), doubled, 1 + strlen( doubled ) );       

That copies the contents of the doubled string to the memory starting at the original '\0' terminator. Note that since the original '\0' terminator is part of the string, you can't use strcpy( doubled + strlen( doubled ), doubled );. Nor can you use memcpy(), for the same reason.

Andrew Henle
  • 32,625
  • 3
  • 24
  • 56
0

Your code invokes undefined behavior in several places by writing to memory that is not yet owned, and is likely the cause of the segmentation fault you are seeing. For example, in this segment:

char *result = output[0];          // Initialize a string with the first element in the array     
    int l;
    char *blank_space = " ";           // The blank spaces that need to be addded into the sentence
    for (l = 1; l < 500; l++) {
        if (output[l] != '\0'){        // If there is a word that exists at the given index, add it
            strcat(result, blank_space);
            strcat(result, output[l]);

By itself...

char *result = output[0]; //creates a pointer, but provides no memory.

...is not sufficient to receive content such as

strcat(result, blank_space);
 strcat(result, output[l]);  

It needs memory:

char *result = malloc(501);//or more space if needed.
if(result)
{
    //now use strcat or strcpy to add content
ryyker
  • 22,849
  • 3
  • 43
  • 87