0

I am fairly experienced with C on embedded platforms, but I haven't used C very much with an OS. I am currently working on a Raspberry Pi 2.

I am working in C and I need to make a utility that creates a CSV file from a portion of a binary file. The binary file contains many hours of data and is formatted as a series of 'blocks', each of which contain ~2000ms of data. The program is to iterate through each block and pull the data until it reaches the end time.

The program works when I attempt relatively small binary-to-csv conversions, but there is no reason it shouldn't work with larger conversions that I can identify. When I'm running the program with MAX_TIME_SAMPLE_TO_CONVERT to 180000, no issues in normal run nor in valgrind. When I change "MAX_TIME_SAMPLE_TO_CONVERT" to 200000, that is when I get the segmentation fault. This only requires a malloc of 60kB of memory, which should be a breeze. When I run "free" from the command line, I have more than 500MB available.

When I run valgrind, I get a somewhat cryptic output, but it tells me the line numbers that I'm having a problem with(I am building with -g option) and those line numbers are exactly the variables that use the malloc:

            /* save the samples into the arrays 
             * that will become the CSV files */
            int32_t time = 0;
            for(int j = 0; (j < numOfSamples) && (time < endTime); j++){
                time = networkTime + (j * SAMPLE_INTERVAL_MS);
                timeArray[sampleIndex] = time;
                sampleArray[sampleIndex] = uncompressedBlockDataArray[j];
                sampleIndex++;
            }

full code:

int32_t startTime = getStartTime(argc, argv);
int32_t endTime = getEndTime(argc, argv);

/* calculate the amount of memory required to construct each array,
 * limiting the maximum amount in order to conserve memory */
uint32_t timeWindow = endTime - startTime;
if(verbose)
    printf("time window: %dms\n", timeWindow);

if(timeWindow > MAX_SAMPLE_TIME_TO_CONVERT){
    timeWindow = MAX_SAMPLE_TIME_TO_CONVERT;
    endTime = startTime + MAX_SAMPLE_TIME_TO_CONVERT;

    if(verbose){
        printf("warning: specified time window results in too many samples\n");
        printf("\ttime window truncated to %dms\n", timeWindow);
        printf("\tnew end time: %d\n", endTime);
    }
}

uint32_t numOfSamples = timeWindow/SAMPLE_INTERVAL_MS;

if(verbose)
    printf("each CSV file will contain up to %d samples (maximum of %dms)\n", numOfSamples, MAX_SAMPLE_TIME_TO_CONVERT);

/* allocate memory to temporarily store the 
 * data from the binary file as it is read */
int32_t *timeArray = (int32_t *)malloc(sizeof(uint32_t) * numOfSamples);
uint16_t *sampleArray= (uint16_t *)malloc(sizeof(uint16_t) * numOfSamples); 

if(verbose)
    printf("Allocating %d bytes for time and %d for samples\n", sizeof(uint32_t) * numOfSamples, sizeof(uint16_t) * numOfSamples);

if((timeArray == NULL) || (sampleArray == NULL)){
    printf("Not enough RAM, exiting...\n");
    return -1;
}

/* iterate through the SN array, saving each binary section to a CSV file  */
for(int i = 0; serialNumbersToExport[i] > 0; i++){
    uint32_t sampleIndex = 0;

    if(verbose)
        printf("\nAttempting binary-to-csv export of serial number %d...\n", serialNumbersToExport[i]);

    /* create the source file paths */
    char strSrcPath[DEFAULT_STR_LENGTH];
    snprintf(strSrcPath, DEFAULT_STR_LENGTH, "/home/updsys/data/SN%d.ubin", serialNumbersToExport[i]);
    if(verbose)
        printf("\tAttempting to access '%s'...\n", strSrcPath);

    /* open the source file */
    FILE *sourceF;
    sourceF = fopen(strSrcPath, "rb");

    if(sourceF != NULL){
        if(verbose)
            printf("\tSource binary found, proceeding...\n");

        /* find the starting point in the file, begin writing to the file
         * until you reach the end of the file or the end time specified */
        int32_t networkTime = 0;
        uint32_t fileByteOffset = 0;
        uint8_t blockHeaderArray[COMPRESSION_BLOCK_HEADER_LENGTH];
        uint8_t blockDataArray[MAX_BLOCK_SIZE_IN_BYTES];

        /* while time is less than end time OR we have reached the end of the file */
        while(networkTime < endTime){
            if(verbose)
                printf("\tbinary file offset: %d\n", fileByteOffset);

            fseek(sourceF, fileByteOffset, SEEK_SET);    // set read pointer to beginning of file

            /* when fread returns 0, break the loop */
            if(fread(blockHeaderArray, 1, COMPRESSION_BLOCK_HEADER_LENGTH, sourceF) == 0)
                break;

            fileByteOffset += COMPRESSION_BLOCK_HEADER_LENGTH;
            fseek(sourceF, fileByteOffset, SEEK_SET);

            networkTime = (uint32_t)blockHeaderArray[0]
                            + (((uint32_t)blockHeaderArray[1]) << 8)
                            + (((uint32_t)blockHeaderArray[2]) << 16)
                            + (((uint32_t)blockHeaderArray[3]) << 24);
            uint16_t numOfSamples = blockHeaderArray[4];
            uint16_t compressedWidth = blockHeaderArray[6];

            uint16_t numBytesToRead = getBlockNumOfBytes16(compressedWidth, numOfSamples);
            fread(blockDataArray, 1, numBytesToRead, sourceF);
            fileByteOffset += numBytesToRead;

            /* if the start time is less/equal to than the time at 
             * the end of the current block, then decompress and 
             * save the data */
            int32_t timeAtEndOfBlock = networkTime + (int32_t)(numOfSamples * SAMPLE_INTERVAL_MS);
            if(startTime <= timeAtEndOfBlock){
                if(verbose)
                    printf("\tstart time (%d) within block end time (%d), decompressing...\n", startTime, timeAtEndOfBlock);

                /* use to save single-block data to */
                uint16_t uncompressedBlockDataArray[(MAX_BLOCK_SIZE_IN_BYTES/2)] = {0};

                /* prepare to decompress */
                CompressionDataStruct16 compressionDataStruct;
                compressionDataStruct.sampleCount = numOfSamples;
                compressionDataStruct.compressedWidth = compressedWidth;
                compressionDataStruct.compressedData = blockDataArray;
                compressionDataStruct.uncompressedData = uncompressedBlockDataArray;
                decompressTo16(&compressionDataStruct);

                /* save the samples into the arrays 
                 * that will become the CSV files */
                int32_t time = 0;
                for(int j = 0; (j < numOfSamples) && (time < endTime); j++){
                    time = networkTime + (j * SAMPLE_INTERVAL_MS);
                    timeArray[sampleIndex] = time;
                    sampleArray[sampleIndex] = uncompressedBlockDataArray[j];
                    sampleIndex++;
                }
            }
        }

        if(verbose){
            printf("\t%d samples found, closing source binary file...\n", sampleIndex);
        }
        fclose(sourceF);

        /* if data was found, then write to CSV; otherwise move on */
        if(sampleIndex > 0){
            /* save the variables to '~/data/nodeNum.csv' */
            char strDestPath[DEFAULT_STR_LENGTH];
            snprintf(strDestPath, DEFAULT_STR_LENGTH, "/home/updsys/data/SN%d.csv", serialNumbersToExport[i]);

            FILE *f;
            f = fopen(strDestPath, "w");    // overwrite

            for(uint16_t j = 0; j < sampleIndex; j++){
                fprintf(f, "%d,%d\n", timeArray[j], sampleArray[j]);
            }

            fclose(f);

            if(verbose){
                printf("%d samples found, saving to %s\n", sampleIndex,strDestPath);
            }
        }else{

        }
    }else{
        if(verbose)
            printf("Source binary not found, moving on to next file...\n");
    }
}

/* free the memory */
free(timeArray);
free(sampleArray);
if(verbose)
    printf("\nfreeing memory...\n");

if(verbose)
    printf("program execution complete\n");
slightlynybbled
  • 2,408
  • 2
  • 20
  • 38
  • 1
    Please [do not cast the result of `malloc`](http://stackoverflow.com/questions/605845/do-i-cast-the-result-of-malloc) – Eugene Sh. Nov 03 '15 at 19:32
  • what is this: `sampleArray[sampleIndex] = uncompressedBlockDataArray[j];**` <--- – Eugene Sh. Nov 03 '15 at 19:34
  • Typo - I was going to highlight that area and missed removing those – slightlynybbled Nov 03 '15 at 19:35
  • should if the for loop condition be `&& time < timeAtEndOfBlock`? ... and if not, is `numSamples` greater than `MAX_BLOCK_SIZE_IN_BYTES/2` ? –  Nov 03 '15 at 19:36
  • I think that is probably the root cause. The decompression only works on 1 block and "numOfSamples" is too high and the end time is likely beyond the end of the block. Post as an answer and I will upvote it, thanks! – slightlynybbled Nov 03 '15 at 19:40
  • @Eugene Sh. Why shouldn't I cast the result of malloc? Is there some style guide or documentation that you can refer to for support? – slightlynybbled Nov 03 '15 at 19:41
  • @slightlynybbled You might have noticed my comment is containing a link – Eugene Sh. Nov 03 '15 at 19:42
  • 1
    The "full code" doesn't have a main function, etc. Please see [this page](http://stackoverflow.com/help/mcve) for how to post code, especially when you expect others to debug it for you. – M.M Nov 03 '15 at 21:04
  • You use the wrong format specifiers in `printf` regularly. `"%d"` is for `int`. For `uint32_t` the specifier is `PRI32u` which can be found by `#include ` . – M.M Nov 03 '15 at 21:06
  • The loop you highlighted should also include `sampleIndex < numOfSamples` as a condition, and perhaps also `j < (MAX_BLOCK_SIZE_IN_BYTES/2)` unless the `decompressto16` function (which you didn't post) performs this check itself – M.M Nov 03 '15 at 21:12

1 Answers1

2

sampleIndex can be greater than numOfSamples as it is not reinitialised to 0 in the inner loop while(networkTime < endTime)

Solution

Make sure that sampleIndex is never greater than numOfSamples in your inner for loop.

jayant
  • 2,349
  • 1
  • 19
  • 27
  • While this information may be helpful, you are better off putting it in a comment, or improving this to include an answer. This only tells them what is wrong, not how to fix it. – user530873 Nov 03 '15 at 19:43
  • Thank you for your time. The 'sampleIndex' is intended so that data can be stored for the relatively large arrays 'timeArray' and 'sampleArray'. It is intended to be reset at the beginning of each array creation (top of the first for loop with 'i' as the index). As I'm writing this post, I'm realizing that a better design would involve reading the blocks of data directly into the CSV file rather than using a large chunk of memory as a simple buffer... – slightlynybbled Nov 03 '15 at 20:19
  • 1
    @slightlynybbled the point is that you need to make sure you do not exceed the array bounds of `timeArray` and `sampleArray`. – M.M Nov 03 '15 at 21:13
  • I agree, I need to ensure that I stay within the bounds of the arrays. This is the answer to my question, but the processing of getting that answer made me realize that I was using much more RAM than was necessary to complete the task with a very slightly different architecture. Thank you all! – slightlynybbled Nov 03 '15 at 21:21