1

On windows, I am using CryptGenRandom API in C (I thought it would be equivalent to /dev/random or /dev/urandom on Linux). To confirm it, I made random files using these with CryptGenRandom on Windows and read from /dev/urandom on Linux, and analyze the result using ent.

The code sample I used to generate the random file using CryptGenRandom (originally from here):

#include <windows.h>

static void
secure_entropy(void *buf, size_t len)
{
    HCRYPTPROV h = 0;
    DWORD type = PROV_RSA_FULL;
    DWORD flags = CRYPT_VERIFYCONTEXT | CRYPT_SILENT;
    if (!CryptAcquireContext(&h, 0, 0, type, flags) ||
        !CryptGenRandom(h, len, buf)) {
        printf("failed to gather entropy");
        abort();
    }
    CryptReleaseContext(h, 0);
}

void test4()
{
    size_t size = 1 << 20;

    FILE *tfile = fopen("random_file", "w");
    char *buf = malloc(size);
    secure_entropy(buf, size);
    fwrite(buf, 1, size, tfile);
    fclose(tfile);
    free(buf);
}

However, ent shows me that the Arithmetic Mean of the random result is around 127.05 instead of 127.5 (as on Linux). I am confident that this is not an incident since I reproduce it several times on different computers and the result is consistent. To further investigate it, I wrote a python script to analyze the frequency of each number (from 0 to 255).

f = open("random_file", "rb")
a = f.read()
f.close()

tmp = [0 for _ in range(256)]

for x in a:
    tmp[int(x)] += 1

print(tmp)

The result looks similar to this:

[4101, 4026, 4027, 4074, 4200, 4021, 4121, 4066, 4035, 3972, 4127, 4010, 
3978, 8214, 4009, 4155, 4083, 4065, 4067, 4064, 3993, 4021, 4136, 4112, 4221, 
4172, 4134, 4117, 3972, 4127, 4175, 4110, 4125, 4181, 4092, 4157, 4122, 4024, 
4020, 4088, 3980, 4140, 4159, 4129, 4064, 4141, 4096, 4238, 4036, 4080, 4151, 
4115, 4086, 4156, 4111, 4106, 4086, 4058, 4179, 4193, 4144, 4206, 4180, 4028, 
4148, 4015, 3979, 4201, 4098, 4146, 4169, 4120, 4044, 4066, 4049, 4051, 4051, 
4122, 4048, 4139, 4125, 4052, 4224, 4091, 4084, 4040, 4183, 4134, 3948, 4132, 
3955, 4162, 4183, 4014, 4100, 4091, 4005, 4146, 4182, 4032, 4037, 3985, 4098, 
4078, 4147, 4060, 4085, 4215, 4039, 4187, 4207, 4161, 4086, 4159, 4018, 4073, 
4051, 4008, 4095, 4110, 4160, 4288, 4077, 4074, 4113, 4104, 4097, 4115, 4049, 
3963, 4083, 4111, 4066, 4084, 4107, 4035, 3977, 4078, 4035, 4008, 3993, 4080, 
4152, 4121, 4111, 4033, 4094, 4191, 4131, 3978, 4082, 4134, 4119, 4135, 4071, 
3993, 3888, 4137, 4188, 4110, 4078, 4186, 4188, 4074, 4196, 4110, 4069, 4135, 
4043, 4150, 4023, 4095, 4074, 4179, 4112, 4084, 4124, 4180, 4154, 3996, 4103, 
4199, 4137, 4155, 4039, 4077, 4159, 4167, 4171, 4115, 4025, 4218, 4046, 4008, 
4178, 3969, 4135, 4077, 4044, 4080, 4085, 4230, 4161, 4151, 4056, 4222, 4033, 
4020, 4187, 4034, 4175, 4167, 3962, 4102, 4054, 3978, 4111, 4001, 4028, 4103, 
4088, 4054, 4049, 4164, 4136, 4110, 4181, 3964, 4098, 4046, 3997, 4151, 4122, 
4272, 4067, 4112, 4037, 4083, 4072, 4106, 4105, 4104, 4166, 4090, 4071, 4080, 
4070, 4087, 4162, 4060, 4237, 4061, 4044, 4128, 4051, 4097]

in which it is clear that 13 (the 14th number) is approximately twice as likely to occur than all the rest of numbers, this would explain the Arithmetic Mean of 127.05 as well.

I am not sure whether it is a bug of CryptGenRandom or I incorrectly implement it, but I have tested it on both my 64-bit Windows 10 and 32-bit Windows 7 computer, and the result is consistent. So anyone has any idea or could help further investigate and confirm it?

lewisxy
  • 137
  • 8

1 Answers1

6

FILE *tfile = fopen("random_file", "w");

You are opening the file in text mode, and you write each character about 4000 times each. Including the '\r' and '\n' characters.

Each time you write '\n', the program inserts an extra '\r', so there are about 8000 '\r' characters whose ASCII value is 13

In Windows, you should open the file in binary mode fopen("random_file", "wb") for non-text files.

Barmak Shemirani
  • 30,904
  • 6
  • 40
  • 77
  • It works, thanks. But I am still curious why this problem doesn't happen on Linux. I used the same test script (with `"w"` as the mode for `fopen`). Is it because that Linux treat `"w"` the same as `"wb"` or that Linux use only `LF` and Windows use `CRLF` instead? – lewisxy Jun 22 '18 at 04:05
  • Yes, that's a Windows specific issue. On other platforms there no hidden conversion made so there is no difference between binary mode and text mode. See also https://stackoverflow.com/questions/229924/difference-between-files-written-in-binary-and-text-mode – Barmak Shemirani Jun 22 '18 at 04:14