I want to calculate the SHA256 value of a file, which has a size of more than 1M. In order to get this hash value with the mbedtls library, I need to copy the whole file to the memory. But my memory size is only 100K. So I want to know if there is some method that calculates the file hash value in sections.
1 Answers
In order to get this hash value with mbedtls library, I need to copy the whole file to the memory.
This is not accurate. The mbedtls library supports incremental calculation of hash values.
To calculate a SHA-256 hash with mbedtls, you would have to take the following steps (reference):
- Create an instance of the
mbedtls_sha256_context
struct. - Initialize the context with
mbedtls_sha256_init
and thenmbedtls_sha256_starts_ret
. - Feed data into the hash function with
mbedtls_sha256_update_ret
. - Calculcate the final hash sum with
mbedtls_sha256_finish_ret
. - Free the context with
mbedtls_sha256_free
Note that this does not mean that the mbedtls_sha256_context
struct holds the entire data until mbedtls_sha256_finish_ret
is called. Instead, mbedtls_sha256_context
only holds the intermediate result of the hash calculation. When feeding additional data into the hash function with mbedtls_sha256_update_ret
, the state of the calculation is updated and the new intermediate result is stored in the mbedtls_sha256_context
.
The total size of a mbedtls_sha256_context
, as determined by sizeof( mbedtls_sha256_context)
, is 108 bytes on my system. We can also see this from the mbedtls source code (reference):
typedef struct mbedtls_sha256_context
{
uint32_t total[2]; /*!< The number of Bytes processed. */
uint32_t state[8]; /*!< The intermediate digest state. */
unsigned char buffer[64]; /*!< The data block being processed. */
int is224; /*!< Determines which function to use:
0: Use SHA-256, or 1: Use SHA-224. */
}
mbedtls_sha256_context;
We can see that the struct holds a counter of size 2*32 bit = 8 byte
that keeps track of the total number of bytes processed so far. 8*32 bit = 32 byte
are used to track the intermediate result of the hash calculation. 64 byte
are used to track the current data block being processed. As you can see, this is a fixed size buffer that does not grow with the amount of data that is being hashed. Finally an int is used to distinguish between SHA-224 and SHA-256. On my system sizeof(int) == 4
. So in total, we get the 8+32+64+4 = 108 byte
.
Consider the following example program, which reads a file step by step into a buffer of size 4096 and feeds the buffer into the hash function in each step:
#include <mbedtls/sha256.h>
#include <stdio.h>
#include <stdlib.h>
#define BUFFER_SIZE 4096
#define HASH_SIZE 32
int main(void) {
int ret;
// Initialize hash
mbedtls_sha256_context ctx;
mbedtls_sha256_init(&ctx);
mbedtls_sha256_starts_ret(&ctx, /*is224=*/0);
// Open file
FILE *fp = fopen("large_file", "r");
if (fp == NULL) {
ret = EXIT_FAILURE;
goto exit;
}
// Read file in chunks of size BUFFER_SIZE
uint8_t buffer[BUFFER_SIZE];
size_t read;
while ((read = fread(buffer, 1, BUFFER_SIZE, fp)) > 0) {
mbedtls_sha256_update_ret(&ctx, buffer, read);
}
// Calculate final hash sum
uint8_t hash[HASH_SIZE];
mbedtls_sha256_finish_ret(&ctx, hash);
// Simple debug printing. Use MBEDTLS_SSL_DEBUG_BUF in a real program.
for (size_t i = 0; i < HASH_SIZE; i++) {
printf("%02x", hash[i]);
}
printf("\n");
// Cleanup
fclose(fp);
ret = EXIT_SUCCESS;
exit:
mbedtls_sha256_free(&ctx);
return ret;
}
When running a program on a large sample file, the following behavior can be observed:
$ dd if=/dev/random of=large_file bs=1024 count=1000000
1000000+0 records in
1000000+0 records out
1024000000 bytes (1.0 GB, 977 MiB) copied, 5.78353 s, 177 MB/s
$ sha256sum large_file
ae2d3b46eec018e006533da47a80e933a741a8b1320cfce7392a5472faae0216 large_file
$ gcc -O3 -static test.c /usr/lib/libmbedcrypto.a
$ ./a.out
ae2d3b46eec018e006533da47a80e933a741a8b1320cfce7392a5472faae0216
We can see that the program calculates the correct SHA-256 hash. We can also inspect the memory used by the program:
$ command time -v ./a.out
...
Maximum resident set size (kbytes): 824
...
We can see that the program consumed at most 824 KB of memory. Thus, we have calculated the hash of a 1 GB file with < 1MB of memory. This shows that we do not have to load the entire file into memory at once to calculate its hash with mbedtls.
Keep in mind this measurement was done on a 64 bit desktop computer, not an embedded platform. Also, no further optimizations were performed besides -O3
and static linking (the latter approximately halved the memory usage of the program). I would expect the memory footprint to be even smaller on an embedded device with a smaller address size and a tool chain performing further optimizations.

- 4,361
- 2
- 13
- 40