1

I'm creating a function in C that checks for duplicate characters using a for loop that updates an array to say whether a character has been seen previously.

int duplicate(string key, int key_length)
{
    int seen[256];

    for (int i = 0; i < key_length; i++)
    {
        seen[(int)key[i]] += 1;
    }

    for (int i = 0; i < 256; i++)
    {
        printf("%i ", seen[i]);
    }

    return 1;
}

When I run the code with a test string (for example fhfkkdkdjrbrhrjrotorrjekwl), the output of the program is:

-268375615 32516 1286666352 32765 1286666368 32765 -268328471 32516 4 0 -268415848 32516 9 0 0 0 1 0 -268187160 32516 1286666436 32765 -268415848 32516 -268187160 32516 -268184328 32516 0 0 1287397800 32765 -268225857 32516 1 0 -1 0 1286666436 32765 -274155632 32516 -268419360 32516 1286666928 32765 1286666416 32765 1286666672 32765 -268309317 32516 -268191943 32516 -268301364 32516 -268191934 32516 -268374096 32516 1286666768 32765 7 0 7 8 -268376704 32516 -268187160 32516 -268321004 32516 9 0 -268318823 32516 1286666592 32765 -274072912 32516 -268419360 32516 0 0 1286666720 32765 0 0 0 0 0 0 -268376080 32516 -268184328 32516 1286666720 32765 -268378111 32516 -268375530 32517 -268376894 32516 -268189622 32516 3 4 2 0 -274155632 32518 -268375296 32516 1230 0 43 0 0 1 0 0 0 0 0 0 0 0 -274197488 32516 899256888 1615131 0 0 -268185200 32516 -8 -1 -268271904 32516 1286667280 32765 -268359765 32516 0 0 0 0 0 0 0 0 0 0 1 0 -268183808 32516 -274195584 32516 -268185248 32516 1286666753 32765 -268187160 32516 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 32 0 0 0 0 0 0 1 0 61765110 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 576 832 896 896 960 1472 2496 2496 2496 2496 2496 2496 2496 2496 2496 2496 2496 2496 2496 2496 2496 2496 2496 2496 2496 2496 2496 2496 2496 2496 0 0 256 64 0 64 512 1024 0 0 0 0 0 0 0 0 0 0 0 0

However when I step through the function in a debugger, it outputs:

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 2 1 2 0 2 0 3 4 1 0 0 2 0 0 6 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 

Which is what I expect it to do.

What am I doing wrong here, and why is it working fine in a debugger but not when the code runs normally?

Mark
  • 27
  • 3
  • enable all warnings and you'll see the problem right away – phuclv Feb 25 '21 at 13:16
  • The question about why the code behaves differently in a debugger cannot be deterministically answered without information about the software you are using and the switches you are using to build. If you are building the program differently for debugging than not, then the compiler switches involved may change some of the program data layout… – Eric Postpischil Feb 25 '21 at 13:40
  • … Some debuggers run “outside” the program being debugged, using operating system features to control the subject program. For those debuggers, whether a debugger is used or not should not affect program behavior, except for timing. But some debuggers may execute “inside” the program being debugged, and they are more liable to affect program behavior. I would expect that to be rarer now, though, in modern general-purpose operating systems. – Eric Postpischil Feb 25 '21 at 13:41
  • @EricPostpischil: Also possible that a debug *build* happens to be using fresh previously-untouched stack space for this array, but a an optimized build is using space that was previously dirtied. (MSVC will "poison" uninitialized memory with `memset(buf, 0xCC, size)` in debug builds, but other compilers don't.) – Peter Cordes Mar 02 '21 at 12:22

1 Answers1

1

You invoked undefined behavior by using values of non-initialized non-static local variable int seen[256];, which is indeterminate. Anything is allowed to happen when undefined behavior is invoked.

Initialize that like

int seen[256] = {0};

to avoid this kind of error.

MikeCAT
  • 73,922
  • 11
  • 45
  • 70
  • The rule that makes accessing uninitialized objects of automatic storage duration undefined is in C 2018 6.3.2.1, and it does not apply to arrays because it only applies to objects that could have been declared `register`. So there is no undefined behavior here, just indeterminate values. – Eric Postpischil Feb 25 '21 at 13:54
  • @EricPostpischil: Really? So a `_Bool arr[10]` would have to get initialized by the compiler on most mainstream implementations where the ABI guarantees a `0` or `1` bit-pattern? ([Does the C++ standard allow for an uninitialized bool to crash a program?](https://stackoverflow.com/q/54120862)) – Peter Cordes Mar 02 '21 at 12:19
  • @EricPostpischil: https://godbolt.org/z/WWcv87 is the OP's code with `_Bool seen[]` instead of `int`. It shows GCC10.2 does *not* init the stack memory, so untouched elements will print as some unknown byte zero-extended to `int`, not necessarily 0 or 1. So GCC is basically giving us indeterminate *bit-patterns* (not valid _Bool values). *That* may be UB for bool specifically, although it's fine for most other types. – Peter Cordes Mar 02 '21 at 12:29
  • @PeterCordes: The C standard does not say it has to be initialized. It says the contents are indeterminate and does not say the behavior of accessing them is undefined. That means each use of an element from the array must behave as if it had some value (could be different on each use) but the program cannot trap or do arbitrary things just because of the use of the indeterminate values. – Eric Postpischil Mar 02 '21 at 14:25
  • 1
    @EricPostpischil: Ok, then GCC is violating the standard here, if the standard says the type value is indeterminate, rather than the bit-pattern. GCC can convert a `_Bool` to `int` and get `123` (or any other 8-bit zero-extended value), not just `0` or `1`. To prevent that, the only practical behaviour is to init arrays where not all bit-patterns are valid for the type. (The other sane as-if behaviour in this case would be to booleanize upon read, viable here since this _Bool array doesn't escape the function.) – Peter Cordes Mar 02 '21 at 18:09