0

below is a small C application. It will ask you for a word to input. It stops asking when has attained four unique words. But in the form shown below it won't run properly until you uncomment the relevant lines.

#include <stdio.h>
#include <string.h>

#define WORDS_COUNT 4

int main()
{
    char* words[WORDS_COUNT];

    int words_added = 0;
    while (words_added<WORDS_COUNT)
    {

        puts ("\n-------enter a word-------");

        char response[250];

        scanf("%s", response);

        int i;
        int duplicate_flag = 0;
        for (i=0; i < words_added; i++)
        {
            if (strcmp(words[i], response) == 0)
            {
                duplicate_flag = 1;
                break;
            };
        };

        if (duplicate_flag == 0)
        {
            //char tmp[250];
            //strcpy(tmp, response);
            words[words_added] = response; //words[words_added] = tmp;
            puts("that's new!");
            words_added ++;
        } else {
            puts("you've said that already...");
        };

    };
    return 0;
};

The major difference as you can see is between words[words_added] = response and words[words_added] = tmp.

Why would the tmp variable work and not the response?

I'm guessing that response will have the exact same address every iteration, and tmp will get a new address every iteration. but why? yet they were both declared in same the while loop???

code shogan
  • 839
  • 2
  • 9
  • 25
  • You code is badly broken and exhibiting undefined behavior. Your problem is that you have not allocated any permanent storage for the list of existing words--only of the pointers to them. – dmckee --- ex-moderator kitten Feb 21 '12 at 21:59
  • [you seriously not want your program to be an entry point for buffer overflow exploitation](http://stackoverflow.com/a/456312/1025391)! – moooeeeep Feb 21 '12 at 22:23
  • @moooeeeep that would be a little paranoid here, it's just a little example code snippet, but point taken ;-) – code shogan Feb 22 '12 at 07:19

2 Answers2

2

When you assign words[words_added] = response you're copying the address (not the contents) of response into the array. So in the original form, your code should see the second word and every subsequent word as duplicates. When you use tmp the code compares each new response to the previous tmp that was stored in (every location of) words[], then copies it into tmp if it's not a duplicate.

So I suspect that your code will detect a duplicate that immediately follows the original, but not one that occurs 2 or more words later.

The words array contains 4 pointers, but no memory has been allocated to those pointers. You need to allocate memory for each element of the words array, and then copy each string into it:

if (duplicate_flag == 0)
{
  words[words_added++] = strdup(response); // allocates mem and copies the string
  puts("that's new!");
} else {
  ...
}

Then be sure to free the memory at the end of your program:

for (i = 0; i < words_added; ++i) {
  free(words[i]);
}
Adam Liss
  • 47,594
  • 12
  • 108
  • 150
  • "The tmp variable is allocated on the stack each time through the for loop, while response persists for the live of the program" yes but why does `tmp` get allocated and not `response`? – code shogan Feb 21 '12 at 22:05
  • You're absolutely right--good catch! I've revised my answer, and I think I've explained the behavior more accurately. Would you mind testing my suspicion that only immediate duplicates will be caught? If you can confirm that, I'm confident we understand what's happening. – Adam Liss Feb 22 '12 at 01:14
  • CONFIRMED! I can see exactly what you mean, there was no "re-allocation" but instead my code was only comparing `response` to the last "different" `response` that was copied into temp!. Why have I always assumed that re-allocations occur within loops? SMH. thanks! – code shogan Feb 22 '12 at 07:38
  • Since those variables are allocated on the stack I wouldn't be surprised if they were deallocated and reallocated during each iteration, but they'd likely end up in the same memory locations. Moreover, they'd probably be unchanged between iterations--but only by "accident" because the code doesn't happen to overwrite the memory between iterations. If you're serious about catching memory-related bugs, you can `memset` variables to zeroes (or invalid, easily-recognized values like `0xAA`) before you release them, to force errors to manifest faster. – Adam Liss Feb 22 '12 at 15:43
1

You're doing it wrong - you're pointing an array of pointers - words[words_added] - at a variable that changes on every iteration - response

You need to allocate storage for words[words_added] on each iteration, before you strcpy:
strcpy(words[words_added], response);

P.S. Don't put semi-colons after closing braces }

KevinDTimm
  • 14,226
  • 3
  • 42
  • 60
  • tmp and response are of the same type. so isn't tmp also changing every iteration? – code shogan Feb 21 '12 at 22:03
  • 1
    doesn't matter - it might have worked - but that was an anomaly - do the appropriate memory allocation `malloc` (and don't forget to deallocate `free` when done) – KevinDTimm Feb 21 '12 at 22:05
  • in the real program that's exactly what I've done, but I really want to understand why the compiler was giving `response` the special treatment. All other local variables in a while loop always get re-allocated, never seen a variable behave like `response` is doing. I really want to know why. – code shogan Feb 21 '12 at 22:09
  • 1
    @codeshogan see [this post](http://stackoverflow.com/a/6445794/1025391) and simply accept that you shouldn't do such thing. – moooeeeep Feb 21 '12 at 22:26
  • @moooeeeep Sometimes I like to be sneaky, when things fall apart and you examine what went wrong you learn more about how it all works. I was hoping to exploit the effect described in you link to see if I could get this (actually a similar) algorithm running entirely on stack... – code shogan Feb 22 '12 at 07:46
  • 1
    @codeshogan why don't you use a `char words[NUM_WORDS][MAX_WORD_LENGTH];` then? – moooeeeep Feb 22 '12 at 08:13
  • @moooeeeep I figured if I started to subscript strings and looping through them I would in effect be re-inventing the `string.h` wheel. – code shogan Feb 22 '12 at 10:03
  • @moooeeeep i also just found `alloca`, which does exactly what I wanted to do, BUT it has been pointed out that even if I were able to use it I would then run the risk of stack overflows. Safe to say I'm done with my experimenting, I know what not to do, and learnt about `alloca` on the way! (but wouldn't use `alloca` for this sort of situation, malloc/calloc instead). – code shogan Feb 22 '12 at 22:34