First, a little context. I am an electrical engineering student minoring in compsci and I am pretty much entirely self taught with very little rigorous training in programming, so there is likely going to be stuff that isn't "standard" in my code below.
This little program is just a utility to generate an arbitrary size data file filled with randomly generated signed int values to be used as input for an assignment. I have completed the assignment and it works fine. This is a question about something strange (to me) that happened in this program that only started happening around the time that I added the section of code to handle checking for duplicates. Before, it just dumped the ints straight to a file, one per line. Then, I realized that wasn't strictly the way the professor's data would be formatted, so I changed it to prevent duplicates, add more than one int per line, and add additional whitespace delimiters (space, tab, newline).
Ok, with all of that said this works as long as I keep MAX_NUMBERS at about 32k or lower. If I make it higher, it displays the count nice and quickly until about 32k or so and then it slows way down for a couple of hundred or so and then abruptly hangs at 32768. Due to this number, I thought it might have to do with the size of an int (using codeblocks with the ming compiler), but sizeof(int) shows that it is 4 bytes, so that shouldn't cause it. Also thought maybe I was hitting a max limit on the number of indices on an array since before it wasn't using an array. My research indicates that this shouldn't be the cause though. I know it's going to slow way down as the number of values that it has to check for duplicates goes way up, but I'm confused why it just abruptly stops.
Lastly, I did try modifying it to use a larger C99 datatype instead of int, just as an experiment, but that did nothing.
If anyone happens to see anything dumb, besides using an array haha, please let me know! This is driving me a bit crazy.
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
int main()
{
const int MAX_NUMBERS = 32000; // don't go higher than about 32000
int* arr;
// arr is used for duplicate checking, a log of everything put into the file is recorded
// in arr and checked against to ensure uniqueness.
const int ALLOW_NEG = 1; // switch to choose whether to allow negative numbers or not.
int x = 0; // the random number that was generated
int index = 0; // main loop control
int index2 = 0; // dupe check loop control
int hpos = 1; // used to select which type of whitespace to add
int uniNum = 1; // uniqueness flag
FILE *f = fopen("nums.txt", "w"); // open the file for writing. creates it if it's not there.
arr = calloc(MAX_NUMBERS, sizeof(int)); // allocate space for the array
for (index = 0; index < MAX_NUMBERS; index++) // arr init loop
arr[index] = 999999999; // init the array to an invalid value. initially was 0, but caused 0 to be omitted by the dupe checker
if (f == NULL){ // sanity check for the file
printf("Error: Unable to open file. Program aborting.\n");
exit(1);
}
printf("Generating data file...\n");
srand(time(NULL)); // seed the random number generator
fprintf(f, "%d\n", MAX_NUMBERS); // write the first line, the total number of ints in the file
for (index = 0; index < MAX_NUMBERS; index++) { // main loop
printf("\r%d", index); // just a display of the indices as the loops running, useless for small counts, semi-useful for very large amounts (100k+)
do { // check for unique number
uniNum = 1; // set uniqueness flag
if (ALLOW_NEG == 1) { // executed if negatives are allowed
// This will allow 0, which makes sens if the
// range includes negative and positive.
x = (rand() % MAX_NUMBERS+1) -((MAX_NUMBERS+1)/2); // generate a random number between (-max_nums/2) and (max_nums/2), totaling max_nums. the +1 is a bug fix, ask if curious
} else { // no negs allowed!
// +1 makes the range from 1 to MAX_NUMBERS + 1,
// change to zero or remove to range from 0 to MAX_NUMBERS
x = (rand() % MAX_NUMBERS+1) + 1; // generate random number of only positive ints and 0.
}
for (index2 = 0;index2 <= index; index2++){ // check currently generated numbers for dupes
if (x == arr[index2]) { // dupe found!
uniNum = 0; // clear uniqueness flag
break; // end the for loop on a dupe, no sense in continuing
}
}
} while(uniNum != 1); // repeat if the number wasn't unique
arr[index] = x; // log the number
if (hpos > 4) { // check to see if the horizontal position indicator is greater than 4
fprintf(f, "%d\n", x); // write to the 5th position horizontally with a newline
hpos = 1; // reset the horizontal position to the first. this gives me 5 numbers
//per line, with differing types of whitespace, just to test the reading
//and storing function. see a2.txt
} else {
switch (hpos) { // select based on which position we are in
case 1 :
fprintf(f, "%d ", x); // first, space
hpos++;
break;
case 2 :
fprintf(f, "%d\t", x); // second, a tab character
hpos++;
break;
case 3 :
fprintf(f, "%d ", x); // third, another space
hpos++;
break;
case 4 :
fprintf(f, "%d\t", x);// fourth, another tab.. fifth is a newline
hpos++;
break;
}
}
}
printf("\n%d numbers generated", index); // eh, print it out. why not?
return 0;
}