1

I'm stuck with a program where just having a printf statement is causing changes in the output.

I have an array of n elements. For the median of every d consecutive elements, if the (d+1)th element is greater or equals to twice of it (the median), I'm incrementing the value of notifications. The complete problem statement might be referred here.

This is my program:

#include <math.h>
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <assert.h>
#include <limits.h>
#include <stdbool.h>

#define RANGE 200

float find_median(int *freq, int *ar, int i, int d) {
    int *count = (int *)calloc(sizeof(int), RANGE + 1);
    for (int j = 0; j <= RANGE; j++) {
        count[j] = freq[j];
    }
    for (int j = 1; j <= RANGE; j++) {
        count[j] += count[j - 1];
    }
    int *arr = (int *)malloc(sizeof(int) * d);
    float median;
    for (int j = i; j < i + d; j++) {
        int index = count[ar[j]] - 1;
        arr[index] = ar[j];
        count[ar[j]]--;
        if (index == d / 2) {
            if (d % 2 == 0) {
                median = (float)(arr[index] + arr[index - 1]) / 2;
            } else {
                median = arr[index];
            }
            break;
        }
    }
    free(count);
    free(arr);
    return median;
}

int main() {
    int n, d;
    scanf("%d %d", &n, &d);
    int *arr = malloc(sizeof(int) * n);
    for (int i = 0; i < n; i++) {
        scanf("%i", &arr[i]);
    }
    int *freq = (int *)calloc(sizeof(int), RANGE + 1);
    int notifications = 0;
    if (d < n) {
        for (int i = 0; i < d; i++)
            freq[arr[i]]++;
        for (int i = 0; i < n - d; i++) {
            float median = find_median(freq, arr, i, d);   /* Count sorts the arr elements in the range i to i+d-1 and returns the median */
            if (arr[i + d] >= 2 * median) {      /* If the (i+d)th element is  greater or equals to twice the median, increments notifications*/
                printf("X");
                notifications++;
            }
            freq[arr[i]]--;
            freq[arr[i + d]]++;
        }
    }
    printf("%d", notifications);
    return 0;
}

Now, For large inputs like this, the program outputs 936 as the value of notifications whereas when I just exclude the statement printf("X") the program outputs 1027 as the value of notifications. I'm really not able to understand what is causing this behavior in my program, and what I'm missing/overseeing.

Passer By
  • 19,325
  • 6
  • 49
  • 96
Saksham
  • 59
  • 8
  • 6
    `for(int j=0;j<=RANGE;j++){ count[j]+=count[j-1]; }`... Now what is the contents of `count[j-1]` in the first iteration? You might be interested in learning about [*undefined behavior*](https://en.wikipedia.org/wiki/Undefined_behavior). – Some programmer dude Sep 25 '17 at 15:58
  • 1
    One way `printf()` can affect results is by stomping on the data that was a variable in another function where a pointer to the now out of scope data was returned. It can also alter memory allocation patterns and make a bug more visible – Jonathan Leffler Sep 25 '17 at 16:02
  • 1
    Another problem is uninitialized variables: the garbage they end up containing depends on previous stack history, which can be drastically affected by `printf` or not. – Steve Summit Sep 25 '17 at 16:05
  • @Someprogrammerdude Thanks for pointing out the mistake. However, even if the iteration starts with j=1, the problem persists – Saksham Sep 25 '17 at 16:05
  • 2
    Allocating with `d` makes more sense than `RANGE`. Suggest `int *count = calloc(d, sizeof *count);` Also reverse operands, drop cast, use sizeof *pointer. – chux - Reinstate Monica Sep 25 '17 at 16:31
  • @JonathanLeffler So, what can be done to ensure printf() doesn't overwrite the 'now out of scope variable's data'? – Saksham Sep 25 '17 at 16:32
  • 3
    Compile with all warnings and debug info (`gcc -Wall -Wextra -g`). Use the debugger. Read about [undefined behavior](https://en.wikipedia.org/wiki/Undefined_behavior) – Basile Starynkevitch Sep 25 '17 at 16:35
  • Code will not find the _median_ using a single `for()` loop with random ordered input. – chux - Reinstate Monica Sep 25 '17 at 16:36
  • 1
    You avoid the problem by not trying to use now out of scope data. Once it is out of scope, it is no longer usable. If the `printf()` is scribbling over it, then it is doing you a favour and revealing UB — undefined behaviour. – Jonathan Leffler Sep 25 '17 at 16:36
  • 4
    We really shouldn't have to go and read the [HackerRank Challenge "fraudulent activity notifications"](https://www.hackerrank.com/challenges/fraudulent-activity-notifications) to understand what your code is doing. You should include the relevant information in your question. It is reasonable to reference the challenge for extra information, but your question should be comprehensible without that. – Jonathan Leffler Sep 25 '17 at 16:39
  • @BasileStarynkevitch Thanks for the reference to 'undefined behavior'. Also, checked for warnings, and there were none – Saksham Sep 26 '17 at 04:16
  • @JonathanLeffler Thanks for the good advice. I've updated/edited my question to make it more comprehensive. However, I've kept back the link to original problem statement, in case anyone wants to dig deeper. – Saksham Sep 26 '17 at 05:24
  • You should use more the debugger `gdb` and its watchpoints. Use also [valgrind](http://valgrind.org/). Read much more about *undefined behaviopr* and work hard to avoid it. With UB the behavior of a program written in C is unpredictable and it is pointless to understand it (without diving into gory implementation details and into machine code) – Basile Starynkevitch Sep 26 '17 at 05:58
  • @chux Can you please explain how can 'dropping the cast' and 'reversing the operand' be useful? – Saksham Sep 26 '17 at 07:59
  • 1
    @Saksham Dropping the cast is [DRY](https://en.wikipedia.org/wiki/Don%27t_repeat_yourself), keeping the cast is [WET](https://en.wikipedia.org/wiki/Don%27t_repeat_yourself#DRY_vs_WET_solutions). By reversing, the variables passed match the intent of `calloc(size_t nmemb, size_t size);` – chux - Reinstate Monica Sep 26 '17 at 13:06

2 Answers2

6

Your program has undefined behavior here:

for (int j = 0; j <= RANGE; j++) {
    count[j] += count[j - 1];
}

You should start the loop at j = 1. As coded, you access memory before the beginning of the array count, which could cause a crash or produce an unpredictable value. Changing anything in the running environment can lead to a different behavior. As a matter of fact, even changing nothing could.

The rest of the code is more difficult to follow at a quick glance, but given the computations on index values, there may be more problems there too.

For starters, you should add some consistency checks:

  • verify the return value of scanf() to ensure proper conversions.
  • verify the values read into arr, they must be in the range 0..RANGE
  • verify that int index = count[ar[j]] - 1; never produces a negative number.
  • same for count[ar[j]]--;
  • verify that median = (float)(arr[index] + arr[index - 1]) / 2; is never evaluated with index == 0.
chqrlie
  • 131,814
  • 10
  • 121
  • 189
  • Yeah, I noticed this mistake and corrected it. Then also, the same problem continues. – Saksham Sep 25 '17 at 16:17
  • @Saksham: I do not understand what the code does, but there are many places for more undefined behavior. Answer updated. – chqrlie Sep 25 '17 at 16:28
  • I did the consistency checks but everything seems alright. For the first check, I computed the `sum` of all `scanf()` returns and that matches the value expected. For the second one, the input I'm giving contains all value in range `0..RANGE`. I did the rest of the checks using conditional statements and found no issue. Thanks anyway. – Saksham Sep 26 '17 at 04:57
4

Your program has undefined behavior (at several occasions). You really should be scared (and you are not scared enough).

I'm really not able to understand what is causing this behavior in my program

With UB, that question is pointless. You need to dive into implementation details (e.g. study the generated machine code of your program, and the code of your C compiler and standard library) to understand anything more. You probably don't want to do that (it could take years of work).

Please read as quickly as possible Lattner's blog on What Every C Programmer Should Know on Undefined Behavior

what I'm missing/overseeing.

You don't understand well enough UB. Be aware that a programming language is a specification (and code against it), not a software (e.g. your compiler). Program semantics is important.

As I said in comments:

  • compile with all warnings and debug info (gcc -Wall -Wextra -g with GCC)

  • improve your code to get no warnings; perhaps try also another compiler like Clang and work to also get no warnings from it (since different compilers give different warnings).

  • consider using some version control system like git to keep various variants of your code, and some build automation tool.

  • think more about your program and invariants inside it.

  • use the debugger (gdb), in particular with watchpoints, to understand the internal state of your process; and have several test cases to run under the debugger and without it.

  • use instrumentation facilities such as the address sanitizer -fsanitize=address of GCC and tools like valgrind.

  • use rubber duck debugging methodology

  • sometimes consider static source code analysis tools (e.g. Frama-C). They require expertise to be used, and/or give many false positives.

  • read more about programming (e.g. SICP) and about the C Programming Language. Download and study the C11 programming language specification n1570 (and be very careful about every mention of UB in it). Read carefully the documentation of every standard or external function you are using. Study also the documentation of your compiler and of other tools. Handle error and failure cases (e.g. calloc and scanf can fail).

Debugging is difficult (e.g. because of the Halting Problem, of Heisenbugs, etc...) - but sometimes fun and challenging. You can spend weeks on finding one single bug. And you often cannot understand the behavior of a buggy program without diving into implementation details (studying the machine code generated by the compiler, studying the code of the compiler).

PS. Your question shows a wrong mindset -which you should improve-, and misunderstanding of UB.

Basile Starynkevitch
  • 223,805
  • 18
  • 296
  • 547
  • The address sanitizer `-fsanitize=address` helped me find several out-of-bounds and use-after-free bugs. Valgrind helped me to find 'uninitialized value' errors and memory leaks. – Saksham Sep 26 '17 at 10:23
  • The blog you suggested was really really informative. Thanks for it. – Saksham Sep 26 '17 at 10:24
  • Then upvote or accept my answer. But I was expecting that `gcc -Wall -Wextra -g` would also be helpful – Basile Starynkevitch Sep 26 '17 at 10:25