2

I have an input array A stored in the memory, which is used to produce another array, much bigger B. However, since B is a huge array, I don't really want to store it in the memory, but to save it locally into a file (using fwrite). For this, I calculate every iteration the ith row and append it to the output file. That way, I only need to store one row at a time in the memory, and eventually, an output file is created, with all the data I needed.

The output file seems to be in proper size, considering the number of items it consists of. Nevertheless, when I try to read back fragments from the output file using fread, (for instance, retrieve the first 2000 items), only the first 23 items are retrieved.

This is the primary function to create the output file:

void exportCovMatrix(char *outputString, double *inputStdMatrix, int colDim, int rowDim) {
    double *covRow = calloc(rowDim, sizeof(double));
    int i, j, n;
    FILE *output;
    fclose(fopen(outputString, "w"));
    output = fopen(outputString, "a");
    assert(covRow != NULL);
    assert(output != NULL);
    for (i = 0; i < rowDim; i++) {
        for (j = 0; j < rowDim; j++)
            covRow[j] = dotProduct(&inputStdMatrix[i * colDim], &inputStdMatrix[j * colDim], colDim);
        n = fwrite(covRow, sizeof(double), rowDim, output);
        assert(n == rowDim);
    }
    fclose(output);
    free(covRow);
}

This is another function, that reads the given output file:

double *calculateNextB(char* inputString, double* row, int dim){
    FILE* input = fopen(inputString, "r");
    int i, j;
    assert(input != NULL);
    for(i = 0; i <= dim; i++){
        j = fread(row, sizeof(double), dim, input);
        printf("%d items were read.\n", j);
    }
    ...
}

I'd appreciate any help in solving this issue. Thanks!

Ido
  • 397
  • 2
  • 7
  • 22
  • 1
    `fclose(fopen(outputString, "w"));` Suspicious line. What does it do? – Abhay Aravinda Apr 05 '20 at 15:23
  • It clears the file contents before we write anything new to the file. Originally, this line wasn't there, and I opened the file in write mode instead of append mode. In both ways I ended up with same issue. – Ido Apr 05 '20 at 15:27

3 Answers3

2

You open the file respectively with

fclose(fopen(outputString, "w"));

and

FILE* input = fopen(inputString, "r");

But as explained for example here

In order to open a file as a binary file a "b" character has to be included in the mode string.

(I know it is a C++ source, but in some system it is true, though it's not in many POSIX systems, as explained in https://linux.die.net/man/3/fopen )

Roberto Caboni
  • 7,252
  • 10
  • 25
  • 39
2

I'd assume the file is really big.

On a 32 bits system, the stream related functions (fopen, fwrite, etc.) are limited to 2GiB. Over this size, the effection of the functions is not defined.

Please refer to this page.

https://www.gnu.org/software/libc/manual/html_node/Opening-Streams.html#index-fopen64-931

Also refer to this question.

https://stackoverflow.com/questions/730709/2gb-limit-on-file-size-when-using-fwrite-in-c
Thermit
  • 66
  • 3
1

regarding this (slightly modified) proposed code:

  1. which properly checks for errors
  2. avoids the use of assert() in (possibly) production code
  3. calculates each line of data, then writes that line to the file.
  4. sets the file size back to 0 length each time this function is called.
  5. properly indicates the file is a 'binary' file rather than a 'text' file.
  6. does not compile due to 'nothing' for function: dotproduct()
  7. does not know the length of each row in inputStdMatrix[ rowDim ][ colDim ]
  8. and shouldn't this parameter: double *inputStdMatrix be written as: double inputStdMatrix[][ colDim ] with the parameters rowDim and colDim be before this parameter
  9. properly limits the 'scope' of the local variables

And now, the proposed code:

#include <stdio.h>
#include <stdlib.h>


void exportCovMatrix(char *outputString, size_t colDim, size_t rowDim, double inputStdMatrix[][ colDim ], ) 
{
    double *covRow = calloc(rowDim, sizeof(double));
    if( ! covRow )
    {
        perror( "calloc for row of data failed" );
        exit( EXIT_FAILURE );
    }

    FILE *output;
    output = fopen(outputString, "wb");
    if( ! output )
    {
        perror( "fopen for write binary file failed" );
        free( covRow );  // cleanup
        exit( EXIT_FAILURE );
    }

    // assert(covRow != NULL);
    // assert(output != NULL);

    for ( size_t i = 0; i < rowDim; i++) 
    {
        for ( size_t j = 0; j < rowDim; j++)
        {
            covRow[j] = dotProduct(&inputStdMatrix[i * colDim],
                                   &inputStdMatrix[j * colDim], 
                                    colDim);
        }

        size_t n = fwrite(covRow, sizeof(double), rowDim, output);
        // assert(n == rowDim);
        if( n != rowDim )
        {
            // handle error of short write
        }
    }

    fclose(output);
    free(covRow);
}

which only writes rowDim lines to the file.

Then, if it is called again, it erases what was in the file. probably not what you want.

user3629249
  • 16,402
  • 1
  • 16
  • 17