How to read binary files properly?

Question

I have a problem with the NIST/Diehard Binary Matrix test. It's about dividing a binary sequence into a 32x32 matrix and calculating its rank. After calculating ranks I need to compute a xi^2 value and then calculate p-value(must be from 0 to 1). I'm getting p-value extremely small even in a random sequence.

I've tried to hardcode some small examples and getting my p-value right though I think my problem is in reading a binary sequence file and getting bits from it.

This is reading from a file and converting to bits sequence.

ifstream fin("seq1.bin", ios::binary);
    fin.seekg(0, ios::end);
    int n = fin.tellg();
    unsigned int start, end;
    char *buf = new char[n];
    fin.seekg(0, ios::beg);
    fin.read(buf, n);
    n *= 8;
    bool *s = new bool[n];
    for (int i = 0; i < n / 8; i++) {
        for (int j = 7; j >= 0; j--) {
            s[(i) * 8 + 7 - j] = (bool)((buf[i] >> j) & 1);
        }
    }

Then I form my matrix and calculate it's rank

    int *ranks = new int[N];

    for (int i = 0; i < N; i++) {
        bool *arr = new bool[m*q];
        copy(s + i * m*q, s +(i * m*q) + (m * q), arr);
        ranks[i] = binary_rank(arr, m, q);
    }

Cheking occurance in ranks

int count_occurrences(int arr[], int n, int x){
    int result = 0;
    for (int i = 0; i < n; i++)
        if (x == arr[i])
            result++;
    return result;
}

Calculating xi^2 and p-value

double calculate_xi(int fm, int fm_1, int remaining, int N) {
    double N1 = 0.2888*N;
    double N2 = 0.5776*N;
    double N3 = 0.1336*N;
    double x1 = (fm - N1)*(fm - N1) / N1;
    double x2 = (fm_1 - N2)*(fm_1 - N2) / N2;
    double x3 = (remaining - N3)*(remaining - N3) / N3;
    return x1 + x2 + x3;
}
double calculate_pvalue(double xi2) {
    return exp(-(xi2 / 2));
}

I expect p-value between 0 and 1 but getting 0 every time. It's because of the extremely big xi^2 value and I couldn't find what I've done wrong. Could you please help me to get things right.

You should (learn to) use a debugger. Step though your code one line at a time until the values of your variables differ from what you expect. Your file reading code looks OK to me, BTW. — john, Jun 05 '19 at 16:56
Marginally related reading: [tellg() function give wrong size of file?](https://stackoverflow.com/questions/22984956/tellg-function-give-wrong-size-of-file) Using `tellg` to get the size of the file falls into the "Usually works, but not guaranteed" camp. — user4581301, Jun 05 '19 at 17:03
Sidenote: Prefer a `std::vector buf(n);` to `char *buf = new char[n];` `vector` leaves you with fewer opportunities for error. — user4581301, Jun 05 '19 at 17:06
Sidenote to sidenote: Read [`std::vector`](https://en.cppreference.com/w/cpp/container/vector_bool) before trying the same trick with `bool`. `vector`s of `bool` are so weird that they get their own documentation page. — user4581301, Jun 05 '19 at 17:08
_"I think my problem is [...]"_ -- Good, you have a hypothesis. The next step of the scientific method is to test that hypothesis. A debugger could help with that. Or maybe you want to write out your `s` array after you finish constructing it. Whichever method you choose, let us know the result. — JaMiT, Jun 06 '19 at 01:56

score 0 · Answer 1 · edited Jun 05 '19 at 18:15

For this part:

for (int i = 0; i < n / 8; i++) {

    for (int j = 7; j >= 0; j--) {

        s[(i) * 8 + 7 - j] = (bool)((buf[i] >> j) & 1);
    }
 }

when you add elements to s array, looks like you switch the position of bytes inside each character: the last bit in character in buf goes into the first bit in character in s array, because the shift initially is 7, so you take first bit in char from buf[], but for s[] it looks to be 0, resulting in swapping. It is easy to verify with debugger though, as from code it is not so obvious. Thanks.

How to read binary files properly?

1 Answers1