Why is MD5Sum so fast

Question

I've been studying hashing in C/C++ and tried to replicate the md5sum command in Linux. After analysing the source code, it seems that md5sum relies on the md5 library's md5_stream. I've approximated the md5_stream function from the md5.h library into the code below, and it runs in ~13-14 seconds. I've tried to call the md5_stream function directly and got ~13-14 seconds. The md5sum runs in 4 seconds. What have the GNU people done to get the speed out of the code?

The md5.h/md5.c code is available in the CoreUtils source code.

#include <QtCore/QCoreApplication>
#include <QtCore/QDebug>

#include <iostream>
#include <iomanip>
#include <fstream>
#include "md5.h"

#define BLOCKSIZE 32784

int main()
{
    FILE *fpinput, *fpoutput;

    if ((fpinput = fopen("/dev/sdb", "rb")) == 0) {
        throw std::runtime_error("input file doesn't exist");
    }

    struct md5_ctx ctx;
    size_t sum;

    char *buffer = (char*)malloc (BLOCKSIZE + 72);
    unsigned char *resblock = (unsigned char*)malloc (16);
    if (!buffer)
      return 1;

    md5_init_ctx (&ctx);
    size_t n;
    sum = 0;

    while (!ferror(fpinput) && !feof(fpinput)) {
        n = fread (buffer + sum, 1, BLOCKSIZE - sum, fpinput);
        if (n == 0){
            break;
        }
        sum += n;

        if (sum == BLOCKSIZE) {
            md5_process_block (buffer, BLOCKSIZE, &ctx);
            sum = 0;
        }
    }

    if (n == 0 && ferror (fpinput)) {
        free (buffer);
        return 1;
    }

    /* Process any remaining bytes.  */
    if (sum > 0){
      md5_process_bytes (buffer, sum, &ctx);
    }

    /* Construct result in desired memory.  */
    md5_finish_ctx (&ctx, resblock);
    free (buffer);

    for (int x = 0; x < 16; ++x){
        std::cout << std::setfill('0') << std::setw(2) << std::hex << static_cast<uint16_t>(resblock[x]);
        std::cout << " ";
    }
    std::cout << std::endl;
    free(resblock);
    return 0;
}

EDIT: Was a default mkspec problem in Fedora 19 64-bit.

When you build your code, are you building in debug mode? What optimization levels are you using? — Tim, Sep 05 '13 at 19:15
It's gnuutils... the source is available and you can look at it yourself. — Marc B, Sep 05 '13 at 19:15
Tim : Currently I'm building in Qt. By default the Qt build environment adds the -O2 build flag. I compile and test in Release mode, so debugging libraries and hooks should not be interfering. MarcB : I obviously know that it's GNUUtils if I'm looking at the md5sum source code. I am looking for insight in how or where they optimized their build. — MagikWorx, Sep 05 '13 at 19:17
you already have the sources, look at the makefile to figure what compiler options are used. — devnull, Sep 05 '13 at 19:23
Do you believe that the optimizations are all compile flags? — MagikWorx, Sep 05 '13 at 20:23
Yes, the Makefile(s) should contain all the compilation flags. Do a `make clean` followed by `make` and see what flags were passed to the compiler. — devnull, Sep 06 '13 at 05:11
Fedora x64 with the default Qt install uses the wrong mkspec file and doesn't use the optimization flags it should. linux g++ (-O) vs linux-64 g++(-O2/-O3). I deleted the generic linux mkspec and made a symbolic link to the proper linux-64 mkspec. All is well and it optimized properly. — MagikWorx, Sep 13 '13 at 20:31
You should write that solution as an answer, and then accept it. — hyde, Feb 17 '14 at 21:27

karl · Answer 1 · 2015-05-26T18:12:39.330

fread() is convenient, but don't use fread() if you care about performance. fread() will copy from the OS to a libc buffer, then to your buffer. This extra copying cost CPU cycles and cache.

For better performance use open() then read() to avoid the extra copy. Make sure your read() calls are multiples of the block size, but lower than your CPU cache size.

For best performance use mmap() map the disk directly to RAM.

If you try something like the below code, it should go faster.

//  compile  gcc  mmap_md5.c  -lgcrypt
#include <sys/mman.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <gcrypt.h>
#include <linux/fs.h> // ioctl

#define handle_error(msg) \
do { perror(msg); exit(EXIT_FAILURE); } while (0)


int main(int argc, char *argv[])
    {
        char *addr;
        int fd;
        struct stat sb;
        off_t offset, pa_offset;
        size_t length;
        ssize_t s;
        unsigned char digest[16];
        char digest_ascii[32+1] = {0,};
        int digest_length = gcry_md_get_algo_dlen (GCRY_MD_MD5);
        int i;

        if (argc < 3 || argc > 4) {
            fprintf(stderr, "%s file offset [length]\n", argv[0]);
            exit(EXIT_FAILURE);
        }
        fd = open(argv[1], O_RDONLY);
        if (fd == -1)
            handle_error("open");
        if (fstat(fd, &sb) == -1)           /* To obtain file size */
            handle_error("fstat");
        offset = atoi(argv[2]);
        pa_offset = offset & ~(sysconf(_SC_PAGE_SIZE) - 1);


        if (sb.st_mode | S_IFBLK ) {
            // block device. use ioctl to find length
            ioctl(fd, BLKGETSIZE64, &length);

        } else  {
            /* offset for mmap() must be page aligned */
            if (offset >= sb.st_size) {
                fprintf(stderr, "offset is past end of file size=%zd, offset=%d\n", sb.st_size, (int) offset);
                exit(EXIT_FAILURE);
            }
            if (argc == 4) {
                length = atoi(argv[3]);
                if (offset + length > sb.st_size)
                    length = sb.st_size - offset;
                /* Canaqt display bytes past end of file */
            } else {    /* No length arg ==> display to end of file */
                length = sb.st_size - offset;
            }
        }
        printf("length= %zd\n", length);
        addr = mmap(NULL, length + offset - pa_offset, PROT_READ,
                                MAP_PRIVATE, fd, pa_offset);
        if (addr == MAP_FAILED)
            handle_error("mmap");


        gcry_md_hash_buffer(GCRY_MD_MD5, digest, addr + offset - pa_offset, length);

        for (i=0; i < digest_length; i++) {
            sprintf(digest_ascii+(i*2), "%02x", digest[i]);
        }
        printf("hash=%s\n", digest_ascii);

        exit(EXIT_SUCCESS);
}

have you actually measured the performance gain? On what type of file system? For sequential file access, I am wondering how mmap could perform significantly better than buffered reads. I would also have expected sequential fread to perform better than unbuffered read on the vast majority of file systems. — Come Raczy, May 26 '15 at 20:49
The original poster was checking a block device /dev/sdb so the file system type is irrelevant. File systems sit above block devices. I tested performance of mine vs md5sum and performance is about the same, my example is about 5% slower than md5sum but not over three times slower compared to original poster. If you want more debate on mmap. see http://stackoverflow.com/questions/45972/mmap-vs-reading-blocks — karl, May 26 '15 at 21:44
Sadly, it was a compilation problem in this case. Thank you for the code and insight. I will definitely check out the linked articles. — MagikWorx, Oct 22 '15 at 12:09

score 0 · Accepted Answer · answered Oct 22 '15 at 12:07

0

It turned out to be an error in the Qt mkspecs regarding an optimization flag not being set properly.

answered Oct 22 '15 at 12:07

MagikWorx

342
1
10

Why is MD5Sum so fast

2 Answers2