0

I have a C code that writes several data files for a physics simulation. These data files are basically text files containing a 2d map of values, ranging from -1 to +1. They can be quite big (around 100 Mb each), but since many of the values are usually the same (long strings of +1 or -1), I thought that compressing them would be a good idea.

The relevant part of the C code that was writing the file was this:

FILE *fp1;
char file1[] = "output_file.dat";
fp1 = fopen(file1,"w");
for ( i = 0; i < Nx; i++ ) {
    for ( j = 0; j < Ny; j++ ) {
        fprintf(fp1, "%.5f ", creal(phi[i*Ny+j]));
    }
    fprintf(fp1, "\n");
}
fclose(fp1);

And the relevant part of the Python code that was reading the file was:

import numpy as np
data = np.loadtxt("output_file.dat")

Now, I am trying to add compression using zlib library. I changed the C code in the following way:

# include <zlib.h>
gzFile fp1;
char file1[] = "output_file.dat";
fp1 = gzopen(file1,"w");
for ( i = 0; i < Nx; i++ ) {
    for ( j = 0; j < Ny; j++ ) {
        gzprintf(fp1, "%.5f ", creal(phi[i*Ny+j]));
    }
    gzprintf(fp1, "\n");
}
gzclose(fp1);

And the Python code:

import numpy as np
import zlib
compressed_data = open("output_file.dat", 'rb').read() 
data = zlib.decompress(compressed_data)

The C code seems to work nicely. The data files are being written and they are smaller than 2 Mb (which is reasonable, given the redundancy of the contents). Unfortunately, the Python script gives me an error:

error: Error -3 while decompressing data: incorrect header check

Anybody can point me in the right direction on how to debug this? Thank you!

martineau
  • 119,623
  • 25
  • 170
  • 301
Tropilio
  • 1,395
  • 1
  • 9
  • 27

1 Answers1

0

Ok, the solution turned out to be very simple. Basically, if I write the data files using the .gz extension:

# include <zlib.h>
gzFile fp1;
char file1[] = "output_file.gz";
fp1 = gzopen(file1,"w");
for ( i = 0; i < Nx; i++ ) {
    for ( j = 0; j < Ny; j++ ) {
        gzprintf(fp1, "%.5f ", creal(phi[i*Ny+j]));
    }
    gzprintf(fp1, "\n");
}
gzclose(fp1);

Then, I can use loadtext function to read them, and they will be automatically decompressed by numpy:

import numpy as np
data = np.loadtxt("output_file.gz")

Alternatively, I could still use the zlib.decompress function, but passing it one more argument (as explained in this question):

zlib.decompress(compressed_data, 15 + 32)
Tropilio
  • 1,395
  • 1
  • 9
  • 27