1

I've got the following code to calculate sha_256 of an input, and i wonder how can know what is the optimal data chunk size for each iteration of CC_SHA256_Update. Does it constant value or a variable that depend on system environment ?

 CC_SHA256_CTX sha256;
 CC_SHA256_Init(&sha256);

 const long bufSize = 32768; //how can i find the optimized size ? 
 char* buffer = (char *) malloc(bufSize);

 int bytesRead = 0;
 if(!buffer) {
     return -1;
 }

 while((bytesRead = (int) fread(buffer, 1, bufSize, file))) {
     CC_SHA256_Update(&sha256, buffer, bytesRead);
 }

EDIT : I've tried a different approach as described in the selected answer below, and acquire the data using mmap (rather then malloc+fread). unfortunately, id didn't improve run time efficiency (it slightly increased)

int fsize(const char *filename) {
    struct stat st; 
    if (stat(filename, &st) == 0)
        return st.st_size;
    return -1; 
}

int fd = open(path, O_RDONLY);

int sz = fsize(path);  
char * buffer = mmap((caddr_t)0, sz, PROT_READ  , MAP_SHARED, fd, 0);

CC_SHA256_CTX sha256;
CC_SHA256_Init(&sha256);

CC_SHA256_Update(&sha256, buffer, sz);

CC_SHA256_Final(output, &sha256);

close(fd);
return 0;
Zohar81
  • 4,554
  • 5
  • 29
  • 82
  • 1
    The underlying file system cache manager will likely read in optimally sized chunks, so you don't have to worry too much about it. Benchmarking will probably be the best tool. Also see [optimal buffer size for reading file in C](http://stackoverflow.com/q/13433286), [Optimum file buffer read size](http://stackoverflow.com/q/1552107), [What is the best buffer size when using BinaryReader to read big files](http://stackoverflow.com/q/19558435), ... – jww Jan 04 '16 at 15:55

1 Answers1

3

I think only testing with different sizes would make that clear, but multiples of 64kB (allocation granularity) might be preferred.

But for best performance, you might consider using memory mapping directly over file. That would eliminate the need to copy all data from kernel mode (OS disk cache) to user mode. You would be accessing the OS cache directly, and you would probably need to call CC_SHA256_Update() only once.

Danny_ds
  • 11,201
  • 1
  • 24
  • 46
  • ***`Plus One`*** for memory mapped I/O – jww Jan 04 '16 at 15:56
  • hi, unfortunately the mmap approach wasn't made the code more efficient. perhaps you may look at my implementation above, and let me know if you think i can improve it. thanks – Zohar81 Jan 05 '16 at 15:02
  • @Zohar81 - Depending on the (processor)time SHA256 takes, the speed gain might be small (also considering this is a one time, forward only read, which is already optimized by the OS), but it shouldn't take longer. But try using `MAP_PRIVATE` instead of `MAP_SHARED`. Also don't forget to call `munmap()` (and a check on the return value of `mmap()`). – Danny_ds Jan 05 '16 at 15:25