(C++) Fastest way possible for reading in matrix files (arbitrary size)

Question

I'm developing a bioinformatic tool, which requires reading in millions of matrix files (average dimension = (20k, 20k)). They are tab-delimited text files, and they look something like:

0.53  0.11

0.24  0.33

Because the software reads the matrix files one at a time, memory is not an issue, but it's very slow. The following is my current function for reading in a matrix file. I first make a matrix object using a double pointer, then fill in the matrix by looping through an input file .

float** make_matrix(int nrow, int ncol, float val){
    float** M = new float *[nrow];
    for(int i = 0; i < nrow; i++) {
        M[i] = new float[ncol];
        for(int j = 0; j < ncol; j++) {
           M[i][j] = val;
        }
    }
    return M;
}


float** read_matrix(string fname, int dim_1, int dim_2){

    float** K = make_matrix(dim_1, dim_2, 0);

    ifstream ifile(fname);
    for (int i = 0; i < dim_1; ++i) {
        for (int j = 0; j < dim_2; ++j) {
            ifile >> K[i][j];
        }
    }

    ifile.clear();
    ifile.seekg(0, ios::beg);
    return K;
}

Is there a much faster way to do this? From my experience with python, reading in a matrix file using pandas is so much faster than using python for-loops. Is there a trick like that in c++?

(added)

Thanks so much everyone for all your suggestions and comments!

Must your matrix object be a `float**`? That type requires many calls to `new`, even though you know the exact number of floats you need. — Drew Dormann, Sep 09 '21 at 20:25
Reasoning for above: It's one of the slowest and most error-prone ways to make a matrix. [Here's a really simple matrix](https://stackoverflow.com/a/2076668/4581301) that is a lot easier to get right and often noticeably faster — user4581301, Sep 09 '21 at 20:27
Every call to `ifstream>>` incurs some overheard. It's likely substantial in this case. — 3Dave, Sep 09 '21 at 20:28
Don't use `float[][]`. Use a single dim array `float* matrix = new...`. `malloc()` and `new` are very expensive calls. You're making at least 20k of them. — 3Dave, Sep 09 '21 at 20:30
Side note: There doesn't seem to be a good reason to `clear` and `seekg` at the end of the function. The stream is about to go out of scope and disappear, so who gives a crap about its positioning and state? Shouldn't be particularly expensive, but it's still wasted code. — user4581301, Sep 09 '21 at 20:31
Trust no one. `ifile >> K[i][j];` could fail hilariously and it's not checked. You owe it to your future self to at least check `ifile`'s state at the end of the function before signing off on the matrix and returning it. — user4581301, Sep 09 '21 at 20:33
And, while you're at it: use your profiler. Find the worst offenders in terms of execution time, and start optimizing *there*. Randomly "fixing" things that might not be the problem is never the correct approach. — 3Dave, Sep 09 '21 at 20:33
Thanks everyone! I'll use 1D array as suggested. But how do I avoid using "ifstream>>" for each individual matrix elements? I see how "ifstream>>" can cause some overhead and errors, but is there any way around it? — JWO, Sep 09 '21 at 21:39
Could you share a few lines of the typical input files for profiling? Anything special you know about those values? For example, do they all start with `0.`? Is the fix-point notation used for each value (no scientific notation)? Do you know the number of digits? — Vlad Feinstein, Sep 09 '21 at 23:09
These are some examples of numbers in an example matrix: "7.5387e-05 0.00102271 0.00639386 0.00773087". But since I'll be reading and writing the matrices, I have a full control of what they will look like. I'm okay with losing some precision if it boosts the speed significantly — JWO, Sep 10 '21 at 01:52

Jeremy Friesner · Answer 1 · 2021-09-09T22:41:48.470

2

Just for fun, I measured the program posted above (using a 20,000x20,000 ASCII input file, as described) on my Mac Mini (3.2GHz i7 with SSD drive) and found that it took about 102 seconds to parse in the file using the posted code.

Then I wrote a version of the same function that uses the C stdio API (fopen()/fread()/fclose()) and does character-by-character parsing into a 1D float array. This implementation takes about 13 seconds to parse in the file on the same hardware, so it's about 7 times faster.

Both programs were compiled with g++ -O3 test_read_matrix.cpp.

float* faster_read_matrix(string fname, int numRows, int numCols)
{
    FILE * fpIn = fopen(fname.c_str(), "r");
    if (fpIn == NULL)
    {
       printf("Couldn't open file [%s] for input!\n", fname.c_str());
       return NULL;
    }

    float* K = new float[numRows*numCols];

    // We'll hold the current number in (numberBuf) until we're ready to parse it
    char numberBuf[128] = {'\0'};
    int numCharsInBuffer = 0;

    int curRow = 0, curCol = 0;
    while(curRow < numRows)
    {
       char tempBuf[4*1024];  // an arbitrary size
       const size_t bytesRead = fread(tempBuf, 1, sizeof(tempBuf), fpIn);
       if (bytesRead <= 0)
       {
          if (bytesRead < 0) perror("fread");
          break;
       }

       for (size_t i=0; i<bytesRead; i++)
       {
          const char c = tempBuf[i];
          if ((c=='.')||(c=='+')||(c=='-')||(isdigit(c)))
          {
             if ((numCharsInBuffer+1) < sizeof(numberBuf)) numberBuf[numCharsInBuffer++] = c;
             else
             {
                printf("Error, number string was too long for numberBuf!\n");
             }
          }
          else
          {
             if (numCharsInBuffer > 0)
             {
                // Parse the current number-chars we have assembled into (numberBuf) and reset (numberBuf) to empty
                numberBuf[numCharsInBuffer] = '\0';
                if (curCol < numCols) K[curRow*numCols+curCol] = strtod(numberBuf, NULL);
                else
                {
                    printf("Error, too many values in row %i!  (Expected %i, found at least %i)\n", curRow, numCols, curCol);
                }
                curCol++;
             }
             numCharsInBuffer = 0;

             if (c == '\n') 
             {
                curRow++;
                curCol = 0;
                if (curRow >= numRows) break;
             }
          }
       }
    }
    fclose(fpIn);

    if (curRow != numRows) printf("Warning:  I read %i lines in the file, but I expected there would be %i!\n", curRow, numRows);

    return K;
}

edited Sep 09 '21 at 22:41

answered Sep 09 '21 at 22:11

Jeremy Friesner

70,199
15
131
234

1

Please don't use `atof`, its behavior can go undefined when given inappropriate input. `strtod` is much better and fails gracefully. – Ben Voigt Sep 09 '21 at 22:14
1

Okay, I changed it to call `strtof()` instead. – Jeremy Friesner Sep 09 '21 at 22:18
1

How fast would fscanf be? – tstanisl Sep 09 '21 at 22:22
1

@tstanisl if it's a `fscanf(fpIn, "%f", &val);` to scan one number at a time, I'd expect it to be about the same as `strtof()`, or slightly slower (since it has to do the same work, but it also has to parse the formatting-string to know what work it has to do first). If OTOH it's a `fscanf(fpIn, "%f %f %f %f [...]", &val1, &val2, &val3, [...], &val20000)`, to parse the entire 20,000 values on a line using one call, that's not an experiment I'm willing to try :) – Jeremy Friesner Sep 09 '21 at 22:25
1

@BenVoigt I just noticed something interesting -- on my computer, running the program using `atof(numberBuffer)` or `strtod(numberBuffer)` completes in about ~13 seconds, while swapping that out for `strtof(numberBuffer)` causes it to complete in ~32 seconds. It looks like `strtof()` is significantly slower than `atof()` or `strtod()` – Jeremy Friesner Sep 09 '21 at 22:39
1

OP is using C++, this is C, why? – Sep 09 '21 at 22:39
1

@KPCT it's C++, calling into a C API, because I want it to run quickly. (if you think it's C, try compiling it with a C compiler and note how you get a syntax error on 'new float') – Jeremy Friesner Sep 09 '21 at 22:40
1

It looks like the speed will depend on the [average?] length of those floats; what did you use in your example? – Vlad Feinstein Sep 09 '21 at 22:57
1

Vlad I used random numbers between 0.00 and 1.00, formatted as shown in the OP’s example. – Jeremy Friesner Sep 10 '21 at 00:52
1

Interesting... That's what I did, but it took a little over ONE second (on my HP laptop, Windows). Running your code. MS Visual Studio 2019 – Vlad Feinstein Sep 10 '21 at 01:27
1

@VladFeinstein note that my input file is 20,000 lines long and each line has 20,000 values in it to parse (file size is about 1.9GB) – Jeremy Friesner Sep 10 '21 at 01:49
@JeremyFriesner I thought mine was too... Lost some zeroes on the way :) – Vlad Feinstein Sep 10 '21 at 16:28

score 2 · Accepted Answer · answered Sep 09 '21 at 23:17

2

The fastest way, by far, is to change the way you write those files: write in binary format, two int first (width, height) then just dump your values.

You will be able to load it in just three read calls.

answered Sep 09 '21 at 23:17

Vlad Feinstein

10,960
1
12
27

Dúthomhas · Answer 3 · 2021-09-10T18:29:01.147

I am dissatisfied with Jeremy Friesner’s otherwise excellent answer because it:

blames the problem to be with C++'s I/O system (which it is not)
fixes the problem by circumventing the actual I/O problem without being explicit about how it is a significant contributor to speed
modifies memory accesses which (may or may not) contribute to speed, and does so in a way that very large matrices may not be supported

The reason his code runs so much faster is because he removes the single most important bottleneck: unoptimized disk access. JWO’s original code can be brought to match with three extra lines of code:

float** read_matrix(std::string fname, int dim_1, int dim_2){

    float** K = make_matrix(dim_1, dim_2, 0);
    
    std::size_t buffer_size = 4*1024;               // 1
    char buffer[buffer_size];                       // 2

    std::ifstream ifile(fname);
    ifile.rdbuf()->pubsetbuf(buffer, buffer_size);  // 3
    
    for (int i = 0; i < dim_1; ++i) {
        for (int j = 0; j < dim_2; ++j) {
            ss >> K[i][j];
        }
    }

//    ifile.clear();
//    ifile.seekg(0, std::ios::beg);
    return K;
}

The addition exactly replicates Friesner’s design, but using the C++ library capabilities without all the extra programming grief on our end.

You’ll notice I also removed a couple lines at the bottom that should be inconsequential to program function and correctness, but which may cause a minor cumulative time issue as well. (If they are not inconsequential, that is a bug and should be fixed!)

How much difference this all makes depends entirely on the quality of the C++ Standard Library implementation. AFAIK the big three modern C++ compilers (MSVC, GCC, and Clang) all have sufficiently-optimized I/O handling to make the issue moot.

locale

One other thing that may also make a difference is to .imbue() the stream with the default "C" locale, which avoids a lot of special handling for numbers in locale-dependent formats other than what your files use. You only need to bother to do this if you have changed your global locale, though.

ifile.imbue(std::locale(""));

redundant initialization

Another thing that is killing your time is the effort to zero-initialize the array when you create it. Don’t do that if you don’t need it! (You don’t need it here because you know the total extents and will fill them properly. C++17 and later is nice enough to give you a zero value if the input stream goes bad, too. So you get zeros for unread values either way.)

dynamic memory block size

Finally, keeping memory accesses to an array of array should not significantly affect speed, but it still might be worth testing if you can change it. This is assuming that the resulting matrix will never be too large for the memory manager to return as a single block (and consequently crash your program).

A common design is to allocate the entire array as a single block, with the requested size plus size for the array of pointers to the rest of the block. This allows you to delete the array in a single delete[] statement. Again, I don’t believe this should be an optimization issue you need to care about until your profiler says so.

I don't mind being critiqued, but if you can explain how my program avoids accessing the disk, I'd like to hear it. I'm reading in 1.9GB of ASCII text from the disk in my tests. — Jeremy Friesner, Sep 10 '21 at 17:43
Er, I never said _it doesn’t access the disk_. I said that _disk access_ was the issue. You fix it by buffering large reads from the disk (nice job!). I used that same fix in this answer, just with C++ instead of C. I hope my answer doesn’t come across as bashing you. — Dúthomhas, Sep 10 '21 at 17:55
I see. I think "he removes the single most important bottleneck: disk access" could probably be rewritten to be more explicit. — Jeremy Friesner, Sep 10 '21 at 18:06

BitTickler · Answer 4 · 2021-09-10T21:47:55.433

At the risk of the answer being considered incomplete (no code examples), I would like to add to the other answers additional options how to tackle the problem:

Use a binary format (width,height, values...) as file format and then use file mapping (MapViewOfFile() on Windows, mmap() or so on posix/unix systems).
Then, you can simply point your "matrix structure" pointer to the mapped address space and you are done. And in case, you do something like sparse access to the matrix, it can even save some real IO. If you always do full access to all elements of the matrix (no sparse matrices etc.), it is still quite elegant and probably faster than malloc/read.
Replacements for c++ iostream, which is known to be quite slow and should not be used for performance critical stuff:
Have a look at the {fmt} library, which has become quite popular in recent years and claims to be quite fast.

Back in the days, when I did a lot of numerics on large data sets, I always opted for binary files for storage. (It was back in the days, when the fastest CPU you get your hands on were the Pentium 1 (with the floating point bug :)). Back then, all was slower, memory was much more limited (we had MB not GB as units for RAM in our systems) and all in all, nearly 20 years have passed since.

So, as a refresher, I did write some code to show, how much faster than iostream and text files you can do if you do not have extra constraints (such as endianess of different cpus etc.).

So far, my little test only has an iostream and a binary file version with a) stdio fread() kind of loading and b) mmap(). Since I sit in front of a debian bullseye computer, my code uses linux specific stuff for the mmap() approach. To run it on Windows, you have to change a few lines of code and some includes.

Edit: I added a save function using {fmt} now as well.
Edit: I added a load function with stdio now as well.
Edit: To reduce memory workload, I reordered the code somewhat and now only keep 2 matrix instances in memory at any given time.

The program does the following:

create a 20k x 20k matrix in ram (in a struct named Matrix_t). With random values, slowly generated by std::random.
Write the matrix with iostream to a text file.
Write the matrix with stdio to a binary file.
Create a new matrix textMatrix by loading its data from the text file.
Create a new matrix inMemoryMatrix by loading its data from the binary file with a few fread() calls.
mmap() the binary file and use it under the name mappedMatrix.
Compare each of the loaded matrices to the original randomMatrix to see if the round-trip worked.

Here the results I got on my machine after compiling this work of wonder with clang++ -O3 -o fmatio fast-matrix-io.cpp -lfmt:

./fmatio
creating random matrix (20k x 20k) (27.0775seconds)
the first 10 floating values in randomMatrix are:
57970.2 -365700 -986079 44657.8 826968 -506928 668277 398241 -828176 394645
saveMatrixAsText_IOSTREAM()
saving matrix with iostream. (192.749seconds)
saveMatrixAsText_FMT(mat0_fmt.txt)
saving matrix with {fmt}. (34.4932seconds)
saveMatrixAsBinary()
saving matrix into a binary file. (30.7591seconds)
loadMatrixFromText_IOSTREAM()
loading matrix from text file with iostream. (102.074seconds)
randomMatrix == textMatrix
comparing randomMatrix with textMatrix. (0.125328seconds)
loadMatrixFromText_STDIO(mat0_fmt.txt)
loading matrix from text file with stdio. (71.2746seconds)
randomMatrix == textMatrix
comparing randomMatrix with textMatrix (stdio). (0.124684seconds)
loadMatrixFromBinary(mat0.bin)
loading matrix from binary file into memory. (0.495685seconds)
randomMatrix == inMemoryMatrix
comparing randomMatrix with inMemoryMatrix. (0.124206seconds)
mapMatrixFromBinaryFile(mat0.bin)
mapping a view to a matrix in a binary file. (4.5883e-05seconds)
randomMatrix == mappedMatrix
comparing randomMatrix with mappedMatrix. (0.158459seconds)

And here is the code:

#include <cinttypes>
#include <memory>
#include <random>
#include <iostream>
#include <fstream>
#include <cstring>
#include <string>
#include <chrono>
#include <limits>
#include <iomanip>

// includes for mmap()...
#include <sys/mman.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <cstdio>
#include <cstdlib>
#include <unistd.h>

// includes for {fmt}...
#include <fmt/core.h>
#include <fmt/os.h>

struct StopWatch {
  using Clock = std::chrono::high_resolution_clock;
  using TimePoint =
    std::chrono::time_point<Clock>;
  using Duration =
    std::chrono::duration<double>;
  
  void start(const char* description) {
    this->description = std::string(description);
    tstart = Clock::now();
  }
  void stop() {
    TimePoint tend = Clock::now();
    Duration elapsed = tend - tstart;
    std::cout << description << " (" << elapsed.count()
          << "seconds)" << std::endl;
  }
  TimePoint tstart;
  std::string description;
};

struct Matrix_t {
  uint32_t ncol;
  uint32_t nrow;
  float values[];
  inline uint32_t to_index(uint32_t col, uint32_t row) const {
    return ncol * row + col;
  }
};

template <class Initializer>
Matrix_t *createMatrix
( uint32_t ncol,
  uint32_t nrow,
  Initializer initFn
) {
  size_t nfloats = ncol*nrow;
  size_t nbytes = UINTMAX_C(8) + nfloats * sizeof(float);
  Matrix_t * result =
    reinterpret_cast<Matrix_t*>(operator new(nbytes));
  if (nullptr != result) {
    result->ncol = ncol;
    result->nrow = nrow;
    for (uint32_t row = 0; row < nrow; row++) {
      for (uint32_t col = 0; col < ncol; col++) {
    result->values[result->to_index(col,row)] =
      initFn(ncol,nrow,col,row);
      }
    }
  }
  return result;
}

void saveMatrixAsText_IOSTREAM(const char* filePath,
                   const Matrix_t* matrix) {
  std::cout << "saveMatrixAsText_IOSTREAM()" << std::endl;
  if (nullptr == matrix) {
    std::cout << "cannot save matrix - no matrix!" << std::endl;
  }
  std::ofstream outFile(filePath);
  if (outFile) {
    outFile << matrix->ncol << " " << matrix->nrow << std::endl;
    const auto defaultPrecision = outFile.precision();
    outFile.precision
      (std::numeric_limits<float>::max_digits10); 
    for (uint32_t row = 0; row < matrix->nrow; row++) {
      for (uint32_t col = 0; col < matrix->ncol; col++) {
    outFile << matrix->values[matrix->to_index(col,row)]
        << " ";
      }
      outFile << std::endl;
    }
  } else {
    std::cout << "could not open " << filePath << " for writing."
          << std::endl;
  }
}

void saveMatrixAsText_FMT(const char* filePath,
              const Matrix_t* matrix) {
  std::cout << "saveMatrixAsText_FMT(" << filePath << ")"
        << std::endl;
  if (nullptr == matrix) {
    std::cout << "cannot save matrix - no matrix!" << std::endl;
  }
  auto outFile = fmt::output_file(filePath);
  outFile.print("{} {}\n", matrix->ncol, matrix->nrow);
  for (uint32_t row = 0; row < matrix->nrow; row++) {
    outFile.print("{}", matrix->values[matrix->to_index(0,row)]);
    for (uint32_t col = 1; col < matrix->ncol; col++) {
      outFile.print(" {}",
            matrix->values[matrix->to_index(col,row)]);
    }
    outFile.print("\n");
  }
  
}

void saveMatrixAsBinary(const char* filePath,
            const Matrix_t* matrix) {
  std::cout << "saveMatrixAsBinary()" << std::endl;
  FILE * outFile = fopen(filePath, "wb");
  if (nullptr != outFile) {
    fwrite( &matrix->ncol, 4, 1, outFile);
    fwrite( &matrix->nrow, 4, 1, outFile);
    size_t nfloats = matrix->ncol * matrix->nrow;
    fwrite( &matrix->values, sizeof(float), nfloats, outFile);
    fclose(outFile);
  } else {
    std::cout << "could not open " << filePath << " for writing."
          << std::endl;
  }
}

Matrix_t* loadMatrixFromText_IOSTREAM(const char* filePath) {
  std::cout << "loadMatrixFromText_IOSTREAM()" << std::endl;
  std::ifstream inFile(filePath);
  if (inFile) {
    uint32_t ncol;
    uint32_t nrow;
    inFile >> ncol;
    inFile >> nrow;
    uint32_t nfloats = ncol * nrow;
    auto loader =
      [&inFile]
      (uint32_t , uint32_t , uint32_t , uint32_t )
      -> float
      {
    float value;
    inFile >> value;
    return value;
      };

    Matrix_t * matrix = createMatrix( ncol, nrow, loader);
    return matrix;
  } else {
    std::cout << "could not open " << filePath << "for reading."
          << std::endl;
  }
  return nullptr;
}

Matrix_t* loadMatrixFromText_STDIO(const char* filePath) {
  std::cout << "loadMatrixFromText_STDIO(" << filePath << ")"
        << std::endl;
  Matrix_t* matrix = nullptr;
  FILE * inFile = fopen(filePath, "rt");
  if (nullptr != inFile) {
    uint32_t ncol;
    uint32_t nrow;
    fscanf(inFile, "%d %d", &ncol, &nrow);
    auto loader =
      [&inFile]
      (uint32_t , uint32_t , uint32_t , uint32_t )
      -> float
      {
    float value;
    fscanf(inFile, "%f", &value);
    return value;
      };
    matrix = createMatrix( ncol, nrow, loader);
    fclose(inFile);
  } else {
    std::cout << "could not open " << filePath << "for reading."
          << std::endl;
  }
  return matrix;
}

Matrix_t* loadMatrixFromBinary(const char* filePath) {
  std::cout << "loadMatrixFromBinary(" << filePath << ")"
        << std::endl;
  FILE * inFile = fopen(filePath, "rb");
  if (nullptr != inFile) {
    uint32_t ncol;
    uint32_t nrow;
    fread( &ncol, 4, 1, inFile);
    fread( &nrow, 4, 1, inFile);
    uint32_t nfloats = ncol * nrow;
    uint32_t nbytes = nfloats * sizeof(float) + UINT32_C(8);
    Matrix_t* matrix =
      reinterpret_cast<Matrix_t*>
      (operator new (nbytes));
    if (nullptr != matrix) {
      matrix->ncol = ncol;
      matrix->nrow = nrow;
      fread( &matrix->values[0], sizeof(float), nfloats, inFile);
      return matrix;
    } else {
      std::cout << "could not find memory for the matrix."
        << std::endl;
    }
    fclose(inFile);
  } else {
    std::cout << "could not open file "
          << filePath << " for reading." << std::endl;
  }
  return nullptr;
}

void freeMatrix(Matrix_t* matrix) {
  operator delete(matrix);
}

Matrix_t* mapMatrixFromBinaryFile(const char* filePath) {
  std::cout << "mapMatrixFromBinaryFile(" << filePath << ")"
        << std::endl;
  Matrix_t * matrix = nullptr;
  int fd = open( filePath, O_RDONLY);
  if (-1 != fd) {
    struct stat sb;
    if (-1 != fstat(fd, &sb)) {
      auto fileSize = sb.st_size;
      matrix =
    reinterpret_cast<Matrix_t*>
    (mmap(nullptr, fileSize, PROT_READ, MAP_PRIVATE, fd, 0));
      if (nullptr == matrix) {
    std::cout << "mmap() failed!" << std::endl;
      }
    } else {
      std::cout << "fstat() failed!" << std::endl;
    }
    close(fd);
  } else {
    std::cout << "open() failed!" << std::endl;
  }
  return matrix;
}

void unmapMatrix(Matrix_t* matrix) {
  if (nullptr == matrix)
    return;
  size_t nbytes =
    UINTMAX_C(8) +
    sizeof(float) * matrix->ncol * matrix->nrow;
  munmap(matrix, nbytes);
}

bool areMatricesEqual( const Matrix_t* m1, const Matrix_t* m2) {
  if (nullptr == m1) return false;
  if (nullptr == m2) return false;
  if (m1->ncol != m2->ncol) return false;
  if (m1->nrow != m2->nrow) return false;
  // both exist and have same size...
  size_t nfloats = m1->ncol * m1->nrow;
  size_t nbytes = nfloats * sizeof(float);
  return 0 == memcmp( m1->values, m2->values, nbytes);
}

int main(int argc, const char* argv[]) {
    std::random_device rdev;
    std::default_random_engine reng(rdev());
    std::uniform_real_distribution<> rdist(-1.0E6F, 1.0E6F);
    StopWatch sw;
    
    auto randomInitFunction =
      [&reng,&rdist]
      (uint32_t ncol, uint32_t nrow, uint32_t col, uint32_t row)
      -> float
      {
    return rdist(reng);
      };
    sw.start("creating random matrix (20k x 20k)");
    Matrix_t * randomMatrix =
      createMatrix(UINT32_C(20000),
           UINT32_C(20000),
           randomInitFunction);
    sw.stop();
    if (nullptr != randomMatrix) {
      std::cout
    << "the first 10 floating values in randomMatrix are: "
    << std::endl;
      std::cout << randomMatrix->values[0];
      for (size_t i = 1; i < 10; i++) {
    std::cout << " " << randomMatrix->values[i];
      }
      std::cout << std::endl;
    
      sw.start("saving matrix with iostream.");
      saveMatrixAsText_IOSTREAM("mat0_iostream.txt", randomMatrix);
      sw.stop();
      sw.start("saving matrix with {fmt}.");
      saveMatrixAsText_FMT("mat0_fmt.txt", randomMatrix);
      sw.stop();
      sw.start("saving matrix into a binary file.");
      saveMatrixAsBinary("mat0.bin", randomMatrix);
      sw.stop();
      
      sw.start("loading matrix from text file with iostream.");
      Matrix_t* textMatrix =
    loadMatrixFromText_IOSTREAM("mat0_iostream.txt");
      sw.stop();
      sw.start("comparing randomMatrix with textMatrix.");
      if (!areMatricesEqual(randomMatrix, textMatrix)) {
    std::cout << "randomMatrix != textMatrix!" << std::endl;
      } else {
    std::cout << "randomMatrix == textMatrix" << std::endl;
      }
      sw.stop();
      freeMatrix(textMatrix);
      textMatrix = nullptr;
      
      sw.start("loading matrix from text file with stdio.");
      textMatrix =
    loadMatrixFromText_STDIO("mat0_fmt.txt");
      sw.stop();
      sw.start("comparing randomMatrix with textMatrix (stdio).");
      if (!areMatricesEqual(randomMatrix, textMatrix)) {
    std::cout << "randomMatrix != textMatrix!" << std::endl;
      } else {
    std::cout << "randomMatrix == textMatrix" << std::endl;
      }
      sw.stop();
      freeMatrix(textMatrix);
      textMatrix = nullptr;
      
      sw.start("loading matrix from binary file into memory.");
      Matrix_t* inMemoryMatrix =
    loadMatrixFromBinary("mat0.bin");
      sw.stop();
      sw.start("comparing randomMatrix with inMemoryMatrix.");
      if (!areMatricesEqual(randomMatrix, inMemoryMatrix)) {
    std::cout << "randomMatrix != inMemoryMatrix!"
          << std::endl;
      } else {
    std::cout << "randomMatrix == inMemoryMatrix" << std::endl;
      }
      sw.stop();
      freeMatrix(inMemoryMatrix);
      inMemoryMatrix = nullptr;
      
      sw.start("mapping a view to a matrix in a binary file.");
      Matrix_t* mappedMatrix =
    mapMatrixFromBinaryFile("mat0.bin");
      sw.stop();
      sw.start("comparing randomMatrix with mappedMatrix.");
      if (!areMatricesEqual(randomMatrix, mappedMatrix)) {
    std::cout << "randomMatrix != mappedMatrix!"
          << std::endl;
      } else {
    std::cout << "randomMatrix == mappedMatrix" << std::endl;
      }
      sw.stop();
      unmapMatrix(mappedMatrix);
      mappedMatrix = nullptr;

      freeMatrix(randomMatrix);
    } else {
      std::cout << "could not create random matrix!" << std::endl;
    }

    return 0;
}

Please note, that binary formats where you simply cast to a struct pointer also depend on how the compiler does alignment and padding within structures. In my case, I was lucky and it worked. On other systems, you might have to tweak a little (#pragma pack(4) or something along that line) to make it work.

Could you please also add the sizes of the different resulting files? Have you tried using [`std::hexfloat`](https://en.cppreference.com/w/cpp/io/manip/fixed)? I'm curious if there could be any perceivable difference in timing or file size compared to the iostream version. — Bob__, Sep 10 '21 at 11:59
The binary file size is exactly what you calculate when looking at `Matrix_t`: 8 Byte for width and height + `width * height * sizeof(float)` (1600000008 bytes). The text file, of course is bigger, because you want to read back the same float value you had when you saved, not some rounded version of it (round trip, see the `outFile.setprecision(..)` bit of the code. The default iostream precision (6 digits I think) is not enough to make the comparision test succeed.) So the text file size is around 4465596638 bytes. — BitTickler, Sep 10 '21 at 19:49
`20000 20000 -873813.312 354560.094 318731.312 127849.352 264831.719 983111.25 -695509.125 ...` is what my text file looks like. I don't think std::hexfloat will make the representation shorter. — BitTickler, Sep 10 '21 at 19:58

(C++) Fastest way possible for reading in matrix files (arbitrary size)

4 Answers4

locale

redundant initialization

dynamic memory block size