how to save a double matrix quickly and compactly if some loss of precision is acceptable?

Question

I have a matrix of double whose size is 1024x1024. I want to write it to a file. Here is the solution for this. Accepted answer is 50-60% more time efficient than the second answer in the question as expected (according to my simple test which is to write to a file in both methods).

There is another solution which is to write to csv file(accepted answer in the question), it is much slower(3-4 times slower)

While I'm writing to file the number of floating point of each element in the matrix is 16 and the output is like following:

-1.6819883882999420e-001 -3.5269531607627869e-001 2.4137189984321594e-001 -3.9325976371765137e-001 -2.2069962322711945e-001 -5.9525445103645325e-002

When I write it to file, file size becomes 24 MB in the first way(first link, accepted answer) and 37 MB in the third way(second link, accepted answer) both of which are unacceptable.

I need to set precision of matrix in a fast way and my output become like -1.6819e-01 -3.5269e-01. Any help will be appreciated.

What I'm doing is read an image of 1024x1024, then process it, then write output Mat(which is double) to a file. Consider I have, thousands of image, my images are all less than 1 MB, my running time for each image is less than 1 sec without saving.

EDIT: When I save the same matrix in Matlab, it becomes 6.75 MB

If you're concerned about performance, why not using binary format? — Mikhail, Oct 07 '13 at 10:28
Assuming x86, you want to store roughly 8MB of information, which is only this size when using binary double representation. Since you need at least numeric_limits::max_digits10 (which is usually 17) plus the e plus separator plus occasional minus and exponent, this is roughly 20-24 bytes (depending on your actual values) per double, so 24MB seems like a good amount, and not unacceptable. — PlasmaHH, Oct 07 '13 at 10:35
@PlasmaHH agreed, a pretty elegant and mathematical explanation :) — Aleksander Lidtke, Oct 07 '13 at 10:46
@PlasmaHH: It keeps in XML, so I think it is stored as ascii. Each character is 1B. Assuming each element is 24 character, `1Mx24xB=24MB` which is equal to what I get :) — smttsp, Oct 07 '13 at 10:51

score 1 · Accepted Answer · edited Jun 20 '20 at 09:12

edit

now has an option to store in 16bit floats.
- you lose some precision(depends on data range,vary the rnd distribution to see different error rates).
- slower(20-30ms)(see the link below for probably faster tricks if it matters)
- 2mb
if you know the nature of your data(range,distribution surface(maybe variable length encoding)) more can be done probably
- see here(32-bit to 16-bit Floating Point Conversion).

code

//using code lifted from http://www.mathworks.com/matlabcentral/fileexchange/23173    
#include "opencv2/opencv.hpp"
#include <string.h>
#include <math.h>

using namespace cv;

#define  INT16_TYPE          short
#define UINT16_TYPE unsigned short
#define  INT32_TYPE          long
#define UINT32_TYPE unsigned long

int doubles2halfp(void *target, void *source, int numel);
int halfp2doubles(void *target, void *source, int numel);

void writemat(char* fpath,Mat& data,bool isf16)
{

    FILE* fp = fopen(fpath,"wb");
    if (!fp)perror("fopen");
    double dbuf[1024];

    for(int i=0;i<1024;++i)
    {
        for(int j=0;j<1024;++j)
            dbuf[j]=data.at<double>(i,j);     
        if(isf16)
        {
            UINT16_TYPE hbuf[1024];
            doubles2halfp(&hbuf,&dbuf,1024);
            fwrite(&hbuf,sizeof(UINT16_TYPE),1024,fp);
        }else
        {
            fwrite(&dbuf,sizeof(double),1024,fp);
        }
    }
    fclose(fp);
}

void readmat(char* fpath,Mat& data,bool isf16)
{
    FILE* fp = fopen(fpath,"rb");
    if (!fp)perror("fopen");

    double dbuf[1024];
    for(int i=0;i<1024;++i)
    {
        if(isf16)
        {
            UINT16_TYPE hbuf[1024];
            fread(&hbuf,sizeof(UINT16_TYPE),1024,fp);
            halfp2doubles(&dbuf,&hbuf,1024);
        }else{

            fread(&dbuf,sizeof(double),1024,fp);
        }
        for(int j=0;j<1024;++j)
        {
            data.at<double>(i,j)=dbuf[j];
        }
    }
    fclose(fp);
}

int main()
{
    RNG rng=  theRNG();
    Mat data = Mat::zeros(Size(1024,1024),CV_64FC1);
    for(int i=0;i<1024;++i)
        for(int j=0;j<1024;++j)
            data.at<double>(i,j)=rng.uniform(-1.,1.);

    writemat("img.bin",data,true); 

    Mat res = Mat::zeros(Size(1024,1024),CV_64FC1);
    readmat("img.bin",res,true);

    double error=0;
    for(int i=0;i<1024;++i)
        for(int j=0;j<1024;++j)
        {
            //printf("%f %f\n",data.at<double>(i,j),res.at<double>(i,j));
            error+=abs(data.at<double>(i,j)-res.at<double>(i,j));
        }
    printf("err=%f avgerr=%f\n",error,error/1024/1024);

    getchar();
    return 0;
}

////////////////////////////////////////
////////////////////////////////////////
////////////////////////////////////////
////////////////////////////////////////


int doubles2halfp(void *target, void *source, int numel)
{
    UINT16_TYPE *hp = (UINT16_TYPE *) target; // Type pun output as an unsigned 16-bit int
    UINT32_TYPE *xp = (UINT32_TYPE *) source; // Type pun input as an unsigned 32-bit int
    UINT16_TYPE    hs, he, hm;
    UINT32_TYPE x, xs, xe, xm;
    int hes;
    static int next;  // Little Endian adjustment
    static int checkieee = 1;  // Flag to check for IEEE754, Endian, and word size
    double one = 1.0; // Used for checking IEEE754 floating point format
    UINT32_TYPE *ip; // Used for checking IEEE754 floating point format

    if( checkieee ) { // 1st call, so check for IEEE754, Endian, and word size
        ip = (UINT32_TYPE *) &one;
        if( *ip ) { // If Big Endian, then no adjustment
            next = 0;
        } else { // If Little Endian, then adjustment will be necessary
            next = 1;
            ip++;
        }
        if( *ip != 0x3FF00000u ) { // Check for exact IEEE 754 bit pattern of 1.0
            return 1;  // Floating point bit pattern is not IEEE 754
        }
        if( sizeof(INT16_TYPE) != 2 || sizeof(INT32_TYPE) != 4 ) {
            return 1;  // short is not 16-bits, or long is not 32-bits.
        }
        checkieee = 0; // Everything checks out OK
    }

    xp += next;  // Little Endian adjustment if necessary

    if( source == NULL || target == NULL ) { // Nothing to convert (e.g., imag part of pure real)
        return 0;
    }

    while( numel-- ) {
        x = *xp++; xp++; // The extra xp++ is to skip over the remaining 32 bits of the mantissa
        if( (x & 0x7FFFFFFFu) == 0 ) {  // Signed zero
            *hp++ = (UINT16_TYPE) (x >> 16);  // Return the signed zero
        } else { // Not zero
            xs = x & 0x80000000u;  // Pick off sign bit
            xe = x & 0x7FF00000u;  // Pick off exponent bits
            xm = x & 0x000FFFFFu;  // Pick off mantissa bits
            if( xe == 0 ) {  // Denormal will underflow, return a signed zero
                *hp++ = (UINT16_TYPE) (xs >> 16);
            } else if( xe == 0x7FF00000u ) {  // Inf or NaN (all the exponent bits are set)
                if( xm == 0 ) { // If mantissa is zero ...
                    *hp++ = (UINT16_TYPE) ((xs >> 16) | 0x7C00u); // Signed Inf
                } else {
                    *hp++ = (UINT16_TYPE) 0xFE00u; // NaN, only 1st mantissa bit set
                }
            } else { // Normalized number
                hs = (UINT16_TYPE) (xs >> 16); // Sign bit
                hes = ((int)(xe >> 20)) - 1023 + 15; // Exponent unbias the double, then bias the halfp
                if( hes >= 0x1F ) {  // Overflow
                    *hp++ = (UINT16_TYPE) ((xs >> 16) | 0x7C00u); // Signed Inf
                } else if( hes <= 0 ) {  // Underflow
                    if( (10 - hes) > 21 ) {  // Mantissa shifted all the way off & no rounding possibility
                        hm = (UINT16_TYPE) 0u;  // Set mantissa to zero
                    } else {
                        xm |= 0x00100000u;  // Add the hidden leading bit
                        hm = (UINT16_TYPE) (xm >> (11 - hes)); // Mantissa
                        if( (xm >> (10 - hes)) & 0x00000001u ) // Check for rounding
                            hm += (UINT16_TYPE) 1u; // Round, might overflow into exp bit, but this is OK
                    }
                    *hp++ = (hs | hm); // Combine sign bit and mantissa bits, biased exponent is zero
                } else {
                    he = (UINT16_TYPE) (hes << 10); // Exponent
                    hm = (UINT16_TYPE) (xm >> 10); // Mantissa
                    if( xm & 0x00000200u ) // Check for rounding
                        *hp++ = (hs | he | hm) + (UINT16_TYPE) 1u; // Round, might overflow to inf, this is OK
                    else
                        *hp++ = (hs | he | hm);  // No rounding
                }
            }
        }
    }
    return 0;
}

int halfp2doubles(void *target, void *source, int numel)
{
    UINT16_TYPE *hp = (UINT16_TYPE *) source; // Type pun input as an unsigned 16-bit int
    UINT32_TYPE *xp = (UINT32_TYPE *) target; // Type pun output as an unsigned 32-bit int
    UINT16_TYPE h, hs, he, hm;
    UINT32_TYPE xs, xe, xm;
    INT32_TYPE xes;
    int e;
    static int next;  // Little Endian adjustment
    static int checkieee = 1;  // Flag to check for IEEE754, Endian, and word size
    double one = 1.0; // Used for checking IEEE754 floating point format
    UINT32_TYPE *ip; // Used for checking IEEE754 floating point format

    if( checkieee ) { // 1st call, so check for IEEE754, Endian, and word size
        ip = (UINT32_TYPE *) &one;
        if( *ip ) { // If Big Endian, then no adjustment
            next = 0;
        } else { // If Little Endian, then adjustment will be necessary
            next = 1;
            ip++;
        }
        if( *ip != 0x3FF00000u ) { // Check for exact IEEE 754 bit pattern of 1.0
            return 1;  // Floating point bit pattern is not IEEE 754
        }
        if( sizeof(INT16_TYPE) != 2 || sizeof(INT32_TYPE) != 4 ) {
            return 1;  // short is not 16-bits, or long is not 32-bits.
        }
        checkieee = 0; // Everything checks out OK
    }

    xp += next;  // Little Endian adjustment if necessary

    if( source == NULL || target == NULL ) // Nothing to convert (e.g., imag part of pure real)
        return 0;

    while( numel-- ) {
        h = *hp++;
        if( (h & 0x7FFFu) == 0 ) {  // Signed zero
            *xp++ = ((UINT32_TYPE) h) << 16;  // Return the signed zero
        } else { // Not zero
            hs = h & 0x8000u;  // Pick off sign bit
            he = h & 0x7C00u;  // Pick off exponent bits
            hm = h & 0x03FFu;  // Pick off mantissa bits
            if( he == 0 ) {  // Denormal will convert to normalized
                e = -1; // The following loop figures out how much extra to adjust the exponent
                do {
                    e++;
                    hm <<= 1;
                } while( (hm & 0x0400u) == 0 ); // Shift until leading bit overflows into exponent bit
                xs = ((UINT32_TYPE) hs) << 16; // Sign bit
                xes = ((INT32_TYPE) (he >> 10)) - 15 + 1023 - e; // Exponent unbias the halfp, then bias the double
                xe = (UINT32_TYPE) (xes << 20); // Exponent
                xm = ((UINT32_TYPE) (hm & 0x03FFu)) << 10; // Mantissa
                *xp++ = (xs | xe | xm); // Combine sign bit, exponent bits, and mantissa bits
            } else if( he == 0x7C00u ) {  // Inf or NaN (all the exponent bits are set)
                if( hm == 0 ) { // If mantissa is zero ...
                    *xp++ = (((UINT32_TYPE) hs) << 16) | ((UINT32_TYPE) 0x7FF00000u); // Signed Inf
                } else {
                    *xp++ = (UINT32_TYPE) 0xFFF80000u; // NaN, only the 1st mantissa bit set
                }
            } else {
                xs = ((UINT32_TYPE) hs) << 16; // Sign bit
                xes = ((INT32_TYPE) (he >> 10)) - 15 + 1023; // Exponent unbias the halfp, then bias the double
                xe = (UINT32_TYPE) (xes << 20); // Exponent
                xm = ((UINT32_TYPE) hm) << 10; // Mantissa
                *xp++ = (xs | xe | xm); // Combine sign bit, exponent bits, and mantissa bits
            }
        }
        xp++; // Skip over the remaining 32 bits of the mantissa
    }
    return 0;
}

Thanks for solution. It works fine most of the times, but when I run write function 3-4 times, it gives exception. when there is no output file it creates and writes, for the second one it overwrites, third on gives exception, namely, `Access violation`. Also when I run it over and over, sometimes, it is not filling the files(file size = 0). Is this something expected or do I miss something? — smttsp, Oct 08 '13 at 10:17
no it wasnt expected, i have run it in a loop a thousand times and it finishes fine.. are you perhaps using threads? can you post a simple example that reproduce the problem? — Zaw Lin, Oct 08 '13 at 10:21
I'm on windows 7 too, using visual studio 2010. I'm running write function 3-4 times for the same file when I run for different files there is no problem. — smttsp, Oct 08 '13 at 10:31
If I run it like, `for(i=1-->10){ runIt(images(i));}` it is fine. But when I do `for(i=1-->10){ runIt(images(0));}`, the file size of the output is 0. It didn't give any exception. — smttsp, Oct 08 '13 at 10:36
this is very strange,i have updated the example program which runs in a loop 10 times, it runs ok. is it the same as yours? — Zaw Lin, Oct 08 '13 at 10:41
The only difference is, I changed `1024 to data.cols, data.rows` and instead of allocating `double buffer[1024]`, I allocate `double* buffer = new double[data.cols];` does it make difference? I dont see any problem related to that — smttsp, Oct 08 '13 at 10:56
did you change fwrite also to `fwrite(buffer,sizeof(double),1024,fp);` ? (you need to) — Zaw Lin, Oct 08 '13 at 11:01
Yeah, I changed 1024 to data.cols because, in some cases data.cols might be greater than 1024. — smttsp, Oct 08 '13 at 11:10
Thanks, it is working now without any exception, but I don't understand the reason why it gives exception in the third run not first or second. Also, what is the logic behind having & and not having it in this situation? It really doesn't make sense. — smttsp, Oct 08 '13 at 12:13
fwrite expect a pointer, but buffer is already a pointer(originally it was an array, so `&` is fine since it return the address of the array). so it returns a pointer to a pointer which is just random address in memory. i think when the exception occurs the random address is located close to a protected region of memory. that's probably why — Zaw Lin, Oct 08 '13 at 12:19
originally i was testing unbuffered version which required & for single variable/s. then i include a tmp buffer, i just kept it there and it still works correctly. but apparently `&` in this case is undefined behaviour according to this answer(http://stackoverflow.com/questions/18214684/does-ampersand-in-front-of-chars-array-affect-scanf-is-it-legit). i wasnt aware of this til now. — Zaw Lin, Oct 08 '13 at 12:31
I see, now makes sense. Thanks a lot for your help. It is fast but still I need to make it less than 8MB — smttsp, Oct 08 '13 at 19:59
are you fine with saving floats? that will cut it down to 4mb. — Zaw Lin, Oct 09 '13 at 01:41
I'm fine with saving as float. Even string or another type is fine if I could read it as float/double later on. 5-6 digit is enough for me. — smttsp, Oct 09 '13 at 05:07
just curious what kind of storage size are you expecting in the end? — Zaw Lin, Oct 12 '13 at 16:39
I (will) have around 1M images of size 1024x1024 and from that images I will have to get this output. It will make me have 4TB at the end using the method you proposed. I don't have any minimum data size limit but I want to reach to the smallest data size I could reach — smttsp, Oct 13 '13 at 07:12
do you know the range of data? if it's ok/possible to squeeze them into a particular range([-1,1] for example) you can get out more precision by using fixed point representation than floating points. also easier to control data size than floats(9-10bits etc) — Zaw Lin, Oct 13 '13 at 07:25
The range is not determined, but most likely [-5,+5] is enough. It looks better this way. But what is the range going to be? Could it be between [-10,+10] with that many bits. Also, I couldnt reply because my machine failed to work these days — smttsp, Oct 22 '13 at 10:34
current code store in ieee half floats(16bits) which reduce it 2megabyte storage. refer here(http://www.opengl.org/wiki/Small_Float_Formats) for representalbe range/precision. but if range is known and is uniformly distributed, you can see (http://stackoverflow.com/a/1659563/1662574). it involves much less math than floats and you can try different number of bit widths easier(9,10,11 bits) etc to see which one give a good balance of precision/compression. although still have to figure out a how to pack non-byte aligned bits together to get actual space savings. — Zaw Lin, Oct 22 '13 at 10:52

score 0 · Answer 2 · answered Oct 07 '13 at 12:02

0

Consider using HDF5 for this. HDF5 is a standard file format (with C and C++ implementations) which lets you store a lot of types of data, but it's especially good for numeric matrices. If you store your data as a compressed HDF5 dataset using float (32-bit) values, I bet it will be much smaller than even the Matlab result.

http://www.hdfgroup.org/HDF5/Tutor/compress.html

answered Oct 07 '13 at 12:02

John Zwinck

239,568
38
324
436

I think my data is not compressible, because after I process the data I got is some random values in each element in the matrix. Matlab gives me 6-7MB output, when I try Zaw Lin's solution after converting to float, it becomes 4MB which is both fast and better than matlab result. If you still think, it will be better, I will try your method, too. By the way, thanks for the idea. – smttsp Oct 09 '13 at 08:24

how to save a double matrix quickly and compactly if some loss of precision is acceptable?

2 Answers2

edit

code