1

Is there a best-practice on how to feed data to fwrite() when the input data is held in a 2-dimentional array and each column has to be saved into a separate file?

The example below simulates data read from a file ("buffer") which is processed and parsed into an 8 rows x 3 columns array ("processedData"). A loop follows to create a new file pointer for each column which is populated by reading one column at the time and saving each file. This results in 3 files of 8 bytes each.

Is there a more efficient way to feed columnar data to fwrite() instead of looping through the array contents one column at the time?

#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main(int argc, char *argv[])
{
    uint8_t t, p, s;

    // Assume inputFile is a buffer with the contents of a file from disk

    uint8_t buffer[24] = {2, 1, 3, 3, 2, 4, 4, 3, 5, 5, 4, 6, 6, 5, 7, 7, 6, 8, 8, 7, 9, 9, 8, 0};

    uint8_t(*processedData)[8][3] = malloc(sizeof *processedData);
    memset(processedData, '\0', sizeof(*processedData));

    // Assume the contents of the buffer are processed in some way...

    for (s = 0; s < 24; s++)
        (*processedData)[s % 8][s % 3] = buffer[s];

    // Parse the processed content by column and save in 3 separate files

    for (t = 0; t < 3; t++)
    {
        uint8_t *hld = (uint8_t *)calloc(8, sizeof(uint8_t));

        for (p = 0; p < 8; p++)
            hld[p] = (*processedData)[p][t];

        char *fileName = (char *)malloc(sizeof(char) * 30);
        sprintf(fileName, "%s-%u", "TestFile", t);

        FILE *tmpFile = fopen(fileName, "w+");
        if (tmpFile == NULL)
        {
            fputs("File error", stderr);
            exit(1);
        }

        fwrite(hld, 1, 8, tmpFile);

        fclose(tmpFile);
        free(hld);
    }

    return 0;
}
  • 1
    Unrelated, but you forget to `free` the `fileName` you create. I'll argue you do too many dynamic allocations, some which are not needed (like the one for `hld`, where you can simply use an array, and why do you allocate `processedData` dynamically?). Also, in C you [don't have to (and really shouldn't) cast the result of `malloc` (or `calloc`)](https://stackoverflow.com/questions/605845/do-i-cast-the-result-of-malloc). – Some programmer dude Jul 22 '23 at 14:56
  • 2
    What's wrong with opening 3 files and doing 24 fwrites? They are buffered. – stark Jul 22 '23 at 15:06
  • Nothing wrong per se - just inquiring to see if more efficient ways could be used. – affluentbarnburner Jul 22 '23 at 16:44
  • Separately - this is just an example - in the work I do I deal with 1GB+ files - writing 1 byte at a time with fwrite() kills performance. Better to rely on fwrite() buffering and let it do its job. – affluentbarnburner Jul 22 '23 at 16:51
  • 8
    Since arrays in C are stored in memory in [row-major order](https://en.wikipedia.org/wiki/Row-_and_column-major_order), it is generally more efficient to use a row-major order access pattern than a column-major access pattern. – Andreas Wenzel Jul 22 '23 at 17:13
  • Benchmark to figure out what is slow before you make changes. Consider scattered output i.e. `writev`. "Best practice" usually boils down to opinion. – Allan Wind Jul 23 '23 at 04:48
  • When reading data, why not put the data that needs to be written to the same file on the same row? – Da Wang Jul 23 '23 at 09:03
  • Side note: In C, it is easier to handle arrays by using pointers that point to the first element of the array, instead of pointers to the entire array. If you write `uint8_t (*processedData)[3] = malloc( 8 * sizeof *processedData );` instead of `uint8_t (*processedData)[8][3] = malloc(sizeof *processedData);`, then accessing the elements in the array will be easier. For example, you can then write `hld[p] = processedData[p][t];` instead of `hld[p] = (*processedData)[p][t];` Note that my previous version of this comment had missing parentheses, so I have corrected and reposted the comment. – Andreas Wenzel Jul 23 '23 at 09:52
  • If you are looking for efficiencies, start by not copying the data unnecessarily. Creating the `processedData` array and copying everything from `buffer` to it is totally unnecessary in this toy. It is very likely unnecessary in your real program. Only keep one copy of the data, and manipulate it as little as possible. – William Pursell Jul 23 '23 at 11:48

1 Answers1

0

Your method can work, but is a bit cumbersome and has a few problems:

  • as the contents seems binary, the files should be open in binary mode with "wb".
  • there is no need for the + in "w+" that allows reading from the file in addition to writing.
  • memory allocation seems unnecessary for the filename and the hld array.
  • if you do allocate memory, you should check for allocation failure and make sure you free all allocated blocks.
  • using uint8_t for loop index variables is error prone.
  • distributing the 1D array into the 2D array should use s / 3 instead of s % 8 for the row index.
  • sprintf is not protected against buffer overflows: you should use snprintf() instead.

Regarding your question: Is there a more efficient way to feed columnar data to fwrite() instead of looping through the array contents one column at the time?

Since the columnar data is a not a contiguous set of bytes, you must iterate through the 2D array one way or another to collect the data for output. Note that there is no need to use fwrite, you could just use fputc for each byte directly, without the need for the intermediary hld array. Yet if the data elements are larger than a byte, or if your application is multithreaded, collecting the data in an array and using a single fwrite is probably easier and more efficient. Just make sure the output file is open in binary mode.

Here is a modified version:

#include <errno.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define ROWS  8
#define COLS  3

int main(int argc, char *argv[]) {
    // Assume inputFile is a buffer with the contents of a file from disk
    uint8_t buffer[ROWS * COLS] = { 2, 1, 3, 3, 2, 4, 4, 3, 5, 5, 4, 6, 6, 5, 7, 7, 6, 8, 8, 7, 9, 9, 8, 0 };
    uint8_t (*processedData)[COLS] = calloc(ROWS, sizeof *processedData);
    if (processedData == NULL) {
        fprintf(stderr, "cannot allocate %dx%d array: %s\n", ROWS, COLS, strerror(errno));
        exit(1);
    }
    // Assume the contents of the buffer are processed in some way...
    for (int s = 0; s < ROWS * COLS; s++) {
        processedData[s / COLS][s % COLS] = buffer[s];
    }
    // Parse the processed content by column and save in 3 separate files
    for (int t = 0; t < COLS; t++) {
        char fileName[30];
        uint8_t hld[ROWS];
        for (int i = 0; i < ROWS; i++) {
            hld[i] = processedData[i][t];
        }
        snprintf(fileName, sizeof fileName, "TestFile-%u", t);
        FILE *tmpFile = fopen(fileName, "wb");
        if (tmpFile == NULL) {
            fprintf(stderr, "cannot open %s: %s\n", fileName, strerror(errno));
            exit(1);
        }
        fwrite(hld, sizeof(*hld), ROWS, tmpFile);
        fclose(tmpFile);
    }
    free(processedData);
    return 0;
}

Note also that processedData could point directly to buffer and thus avoid the need for allocation and initialization:

    // use buffer as a packed 2D array
    uint8_t (*processedData)[COLS] = (uint8_t(*)[COLS])buffer;
chqrlie
  • 131,814
  • 10
  • 121
  • 189