1

What I'm tryng to make: I'm trying to do a program that analyzes and give some relevant information about any given set of numbers.

To make it useful not only for the purpose of my research, I'm writing it to generate a random array of any given number of elements. Afterwards I'm planning to put the option for it to process a file with its set of numbers.

What is my question: The problem that brings me here is that when I'm asked to repeat, modify the array size or to quit the program it only respond accordingly on the second input and I have no clue on what reason would be for this behavior.

Here is the code:

#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <math.h>

/*This program creates an array, calculates its standard deviation and identify its large element.*/
int DV(int n){ /*DV -> Compute the standard deviation, identify the large element and its respective position*/
  int i, l, p = 0;
  float x[n], MVx = 0, j = 0, k = 0;
  char rst;
  printf("Be a random array with %d elements, created from 0 to %d: \n", n, n-1);
  for(i = 0; i < n; i++){ //Generates an array with n random elements
    x[i] = rand() % n;
    MVx += x[i];
    printf("%.0f ", x[i]);
    }
  MVx /= n; // Here MV assumes the Arithmetic mean
  printf("\nThe arithmetic mean is: %.5f\n", MVx);
  for (i = 0; i < n; i++) {
    j = x[i] - MVx;
    j *= j;
    k += j;
  }
  printf("The standard deviation is %.5f\n\n", sqrt(k/n));
  j = 0; // j needs to be reseted
  for(i = 0, l = i + 1; i < n - 1, l < n; i++, l++){ //Compares all elements and puts its largest value on j
    switch (x[i] > x[l]){
      case 1: if(j < x[i]){ j = x[i]; p = i;} break;
      case 0: if(j < x[l]){ j = x[l]; p = l;} break;
      }
    }
  printf("The largest value on sample is: %.2f\nIt occurs for the first time on the %dº element\n", j, p+1);
  printf("\n\nRepeat? (y/n)\nA/a to change array size: ");
  while((getchar()) != '\n');
  rst = getchar();
  switch (rst) {
    case 'N':
    case 'n': return 1; break;
    case 'A':
    case 'a': return 2; printf("\n\n"); break;
    case 'Y':
    case 'y': return 3; printf("\n\n"); break;
    default: return -100;
    }
  }

int main(void){
  int n;
  char rst = 'a', qtd = 'a';
  while(rst == 'a') {
    if(qtd == 'a'){
      printf("Insert the amount of elements to be computed: ");
      scanf("%d", &n);
    }
    srand(time(NULL)); //Generates randomic seed
    DV(n);
    switch(DV(n)){
      case 3: printf("\n\n"); rst = 'a'; qtd = 'n'; break;
      case 2: printf("\n\n"); rst = qtd = 'a'; break;
      case 1: printf("\n\n"); return 0; break;
      default: printf("Invalid entry\n\n"); return 0;
    }
  }
}
Azgrom
  • 37
  • 6
  • You should use meaningful variable names and format your code so it's more readable. – Fiddling Bits Nov 17 '18 at 01:16
  • 2
    This line looks pretty suspicious: `while((getchar()) != '\n');` – 0x5453 Nov 17 '18 at 01:28
  • 1
    Separating your user interface from statistical calculations would be good practice. Then you can replace your user interface with say, a `.tsv` file, or a GUI, and you don't have to start again. – Neil Nov 17 '18 at 01:37
  • 1
    You have `DV(n);` on one line, then `switch(DV(n)){` on the next... that calls `DV()` twice in a row, and only acts on the return value for the second one. – Dmitri Nov 17 '18 at 01:44
  • If you don't need to access the data again, consider Welford's on-line algorithm, which doesn't need storage of the `n` elements, https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Welford's_Online_algorithm. – Neil Nov 17 '18 at 01:48
  • The `while((getchar()) != '\n');` is to substitute the `fflush(stdin);` I was using. Using `fflush(stdin);` doesn't change anything, by the way. The program doesn't even ask and runs twice. About the feedback on good programming practices, aye aye sirs. I'll keep that in mind. About what @Dmitri said... would it be possible that any return from the first DV() call would affect the subsequent `switch(DV(n)){`? If so there's any way I could clear any residue buffer/cache from the first DV() call? – Azgrom Nov 17 '18 at 15:36
  • @Azgrom yes, `fflush` is only defined on output streams. – Neil Nov 18 '18 at 19:39

2 Answers2

0

I don't know if this goes on the possibility of two DV() calls like @Dmitri said. But it certainly is another interpretation I could give to what @Neil Edelman said.

When I took out the interface interpretation of the DV() function and put the prompt and decision making code on the main() function the program works like pretended.

I'll have to comment about using the Welford's on-line algorithm... Since it may behave strangely with large amounts of elements I'll keep that code of mine, once it works fine to an amount of elements until somewhat between 2,090,000.00 and 2,100,000.00. And I'm not using double...

The code that now works is as follows:

#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <math.h>

/*This program creates an array, calculates its standard deviation and identify its large element.*/
int DV(int n){ /*DV -> Compute the standard deviation, identify the large element and its respective position*/
  int i, l, p = 0;
  float x[n], MVx = 0, j = 0, k = 0;
  char rst;
  printf("Be a random array with %d elements, created from 0 to %d: \n", n, n-1);
  for(i = 0; i < n; i++){ //Generates an array with n random elements
      x[i] = rand() % n;
      MVx += x[i];
      printf("%.0f ", x[i]);
    }
   MVx /= n; // Here MV assumes the Arithmetic mean
   printf("\nThe Arithmetic Mean is: %.5f\n", MVx);
   for (i = 0; i < n; i++) {
      j = x[i] - MVx;
      j *= j;
      k += j;
    }
   printf("The standard deviation is: %.5f\n\n", sqrt(k/n));
   j = 0; // j needs to be reseted
   for(i = 0, l = i + 1; i < n - 1, l < n; i++, l++){ //Compares all elements and puts its value on j
      switch (x[i] > x[l]){
        case 1: if(j < x[i]){ j = x[i]; p = i;} break;
        case 0: if(j < x[l]){ j = x[l]; p = l;} break;
        }
      }
   printf("The largest value on sample is: %.2f\nIt occurs for the first time on the %dº element\n", j, p+1);
}

int main(void){
  int n;
  char rst = 'a', qtd = 'a';
  while(rst == 'a') {
    if(qtd == 'a'){
      printf("Insert the amount of elements to be computed: ");
      scanf("%d", &n);
    }
    srand(time(NULL)); //Generates randomic seed
    DV(n);
    printf("\n\nRepeat? (y/n)\nA/a to change array size: ");
    while((getchar()) != '\n');
    rst = getchar();
    switch (rst) {
      case 'N':
      case 'n': return 0; break;
      case 'A':
      case 'a': printf("\n\n"); rst = qtd = 'a'; break;
      case 'Y':
      case 'y': printf("\n\n"); rst = 'a'; qtd = 'n'; break;
      default: printf("Invalid Entry\n\n"); return 0;
    }
  }
}
Azgrom
  • 37
  • 6
0

When storing an array on the stack with C99's new feature, float x[n], be wary of stack overflow when n is large. See How to declare and use huge arrays of 1 billion integers in C? and https://wiki.sei.cmu.edu/confluence/display/c/MEM05-C.+Avoid+large+stack+allocations.

If RAND_MAX is 2,147,483,647, x = rand() returns a [quasi-]uniform distribution, E[x] = 1,073,741,823. Assuming IEEE 754 32-bit float, one can only store 16,777,217 integers exactly, see, Which is the first integer that an IEEE 754 float is incapable of representing exactly?. When one adds n replicas, the precision drops as 1/n. When n is large, this matters, too. In physics labs, we were always reminded that it's almost always appropriate to use double to manipulate measurements, this gets into why it matters, (for time, but also in general,) https://randomascii.wordpress.com/2012/02/13/dont-store-that-in-a-float/.

This code is an implementation of Welford's Online algorithm as https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Welford%27s_Online_algorithm, except in C. The advantage of this is that you don't need to know the number of elements and you don't have to store the elements, (it's on-line and the memory is O(1)); also, it's numerically more stable than adding a, potentially large, sum.

#include <stdlib.h> /* EXIT_ size_t */
#include <stdio.h>  /* printf */
#include <math.h>   /* sqrt */

/** Measurement. C version of Python
 \url{ https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Welford's_Online_algorithm }. */
struct Mx {
    size_t count;
    double mean, ssdm;
};

static void mx_reset(struct Mx *const measure) {
    if(!measure) return;
    measure->count = 0;
    measure->mean  = 0;
    measure->ssdm  = 0;
}

static void mx_add(struct Mx *const measure, const double replica) {
    size_t n;
    double delta;
    if(!measure) return;
    n     = ++measure->count;
    delta = replica - measure->mean;
    measure->mean += delta / n;
    measure->ssdm += delta * (replica - measure->mean);
}

static double mx_mean(const struct Mx *const measure) {
    if(!measure || !measure->count) return NAN;
    return measure->mean;
}

static double mx_sample_variance(const struct Mx *const measure) {
    if(!measure || measure->count <= 1) return NAN;
    return measure->ssdm / (measure->count - 1);
}

static double mx_population_variance(const struct Mx *const measure) {
    if(!measure || !measure->count) return NAN;
    return measure->ssdm / measure->count;
}

/** This is the example from
 \url{ https://en.wikipedia.org/wiki/Standard_deviation }. */
int main(void) {
    const float fulmars_f[] = { 727.7f, 1086.5f, 1091.0f, 1361.3f, 1490.5f,
        1956.1f }, fulmars_m[] = { 525.8f, 605.7f, 843.3f, 1195.5f, 1945.6f,
        2135.6f, 2308.7f, 2950.0f };
    const size_t fulmars_f_size = sizeof fulmars_f / sizeof *fulmars_f,
        fulmars_m_size = sizeof fulmars_m / sizeof *fulmars_m;
    struct Mx f, m;
    size_t i;
    mx_reset(&f), mx_reset(&m);
    /* Converts float -> double. */
    for(i = 0; i < fulmars_f_size; i++) mx_add(&f, fulmars_f[i]);
    for(i = 0; i < fulmars_m_size; i++) mx_add(&m, fulmars_m[i]);
    printf("female breeding Northern fulmars\nmean:\t%f.\nstddev:\t%f\n"
        "population stddev: %f\n\nmale breeding Northern fulmars\n"
        "mean:\t%f.\nstddev:\t%f\npopulation stddev: %f\n", mx_mean(&f),
        sqrt(mx_sample_variance(&f)), sqrt(mx_population_variance(&f)),
        mx_mean(&m), sqrt(mx_sample_variance(&m)),
        sqrt(mx_population_variance(&m)));
    return EXIT_SUCCESS;
}

In this, the user interface is a constant, but doing for(i = 0; i < n; i++) mx_add(rand() % n); for some imputed n would separate your user interface from statistical calculations.

Neil
  • 1,767
  • 2
  • 16
  • 22
  • 1
    Hey man. Sorry for not giving you any answer earlier. I was waiting until I had the time to read all you put here to comment anything, and I want to thank you! Your answer did help me to understand a bit more of those things! – Azgrom Dec 03 '18 at 00:42