8

I am looking for a hash function, that can hash a list of non-repeating integers while ignoring the order of them.

Example

I want the two lists

l1 = [0, 1, 3, 7]
l2 = [7, 3, 1, 0]

to have the same hash.

Background

I have an algorithm that finds a list of vertices on a graph. In an undirected graph, the algorithm will find certain lists multiple times in different orders. With my current understanding of the algorithm, it is easier to filter out the duplicates rather than re-inventing the algorithm. For performance reasons, I understand it to be easier to hash the found lists of vertices rather than comparing the whole lists.

Possible answers

Now, I see that

  • an XOR or a simple sum might be an answer.
    Unfortunately, both offer too much potential for hash collisions, as I see it.
  • The not-very-efficient working method is to sort a list, and then use this sorted list to compare the new list (also sorted) against.

Other Thoughts

Given that

  • The lists contain only integers.
  • The integers will be the vertex indices, and the graph can have billions of vertices.
  • The integers in a list are non-repeating, and their order doesn't matter.
  • The lists can and will consist of between 2 and 100 (and in some cases > 1000) entries.
  • No need for cryptographically-secure randomness.

I have this feeling that there should be a relatively easy and straight-forward answer, and I just have not found it.

ikegami
  • 367,544
  • 15
  • 269
  • 518
  • 2
    The lists can – and will – be between 2 and > 100 (in some cases > 1000) entries long. – BernhardWebstudio Dec 20 '22 at 14:43
  • 2
    The integers will be the vertex indices, i.e., at max equal to the number of vertices in the graph, which can go to the billions, though mostly rather millions I guess – BernhardWebstudio Dec 20 '22 at 14:44
  • 4
    Sounds like the length of the list would be a good differentiator for the longer lists. Using that as the hash on its own would not be good, but it could be an important component. – ikegami Dec 20 '22 at 14:50
  • 3
    Maybe you can combine xor, sum, multiplication and the list length. The hash may overflow of course but that does not matter. – Sven Nilsson Dec 20 '22 at 15:01
  • 2
    @SvenNilsson *The hash may overflow of course but that does not matter.* It only would not matter if the hash is *un*signed. If the hash is a signed integer value, overflow is undefined behavior. – Andrew Henle Dec 20 '22 at 15:08
  • 3
    That is correct, but any sensible programmer would make a hash unsigned. And I'm happy that the answer below is in line with what I suggested. – Sven Nilsson Dec 20 '22 at 15:16

3 Answers3

8

Use a combination of the product, sum and ^. All are communitive (order independent) with unsigned math.

unsigned long long product = 1;
unsigned sum = 0;  // Maybe unsigned long long
unsigned x = 0;
for (i=0; i < array_element_count; i++) {
  product *= l[i];
  sum += l[i];
  x ^= l[i];
}
unsigned long long pre_hash = product + sum + ((unsigned long long) x << 32));
unsigned hash = pre_hash % hash_table_size;

Tip: hash_table_size should be a prime to effectively use all pre_hash bits.


If array_element_count was high, I would consider p *= shift_right_until_odd(l[i]), else p will too often become 0.

If l[i] == 0 p *= l[i] deserves something different. A simple mitigation is p *= l[i] | 1, but that is something pulled out of the air.

Hashing takes time for good design and the above are candidate building blocks for OP.

chux - Reinstate Monica
  • 143,097
  • 13
  • 135
  • 256
  • The modular operator is slow and not necessary to make a hash. – Simon Goater Dec 20 '22 at 15:30
  • 2
    @SimonGoater The extra 10-15 clock cycles in the instruction pipeline for the modulo operation are probably not even measurable given events such as IO, pipeline stalls, cache misses, and register spills and fills will utterly dominate performance. – Andrew Henle Dec 20 '22 at 15:34
  • You don't know that. It depends how it is used. – Simon Goater Dec 20 '22 at 15:36
  • 2
    @SimonGoater You're engaging in premature optimization without evidence. And without actual evidence obtained from detailed profiling of an actual implementation that the modulo operation in question could cause something like a cache miss or register spill/fill cycle, decades of experience doing actual performance optimizations tells me it **won't** because it's a 10-15 CPU cycle addition to a long pipeline of instructions with no branching that's probably several hundred CPU cycles long in any real-world implementation. – Andrew Henle Dec 20 '22 at 15:44
  • 1
    True `%` is not _necessary_ to make a hash, yet can readily make for a [better hash](https://stackoverflow.com/a/32915815/2410359). Depending on the data set and details of forming the _pre_hash_, that `%` can reduce collisions, thus making for better performance. We need to rate the value of creating dispersal against its work. – chux - Reinstate Monica Dec 20 '22 at 15:51
  • My comments are factual. If anything, @Andrew Henle is making assumptions about performance. Using a slow operation to make a weak hash less weak doesn't seem better than making a better hash to me. Also, using product makes the end product zero if ANY of the elements is zero so it doesn't add good dispersal. Also if there are more factors of 2 in the ALL the numbers in the list than can fit in product, it will be zero. – Simon Goater Dec 20 '22 at 16:05
  • 1
    @SimonGoater When the pre_hash function is not coded, but passed in as a function pointer, the hash routine do not have access to recoding to "making a better hash". Also consider dynamic growing hash tables where `hash_table_size` varies. "if ANY of the elements is zero" is a good concern. " there are more factors of 2 " already address in answers "consider `p *= shift_right_until_odd(l[i])`". – chux - Reinstate Monica Dec 20 '22 at 16:10
  • @SimonGoater In my experience, hash support libraries kick back the pre_hash function to the user (albeit with some candidate function for the usual suspects). Highly optimized hashing is non-trivial development and can then, as you suggest omit the general `%`. Yet with general purpose hash table libraries, the `%` is a useful improvement. As as time has gone on, `%` is less expensive than is was the old days. – chux - Reinstate Monica Dec 20 '22 at 16:20
  • The OP only wants to 'filter out duplicates' so the range of values returned by the hash isn't as important as it would be if it were used for a hash table for instance. When the array_element_count is typically only 2 to 100 elements, I think the speed of the modulus operator is something that should be considered. – Simon Goater Dec 20 '22 at 16:24
  • @SimonGoater Some high speed hash tables used a reduced set of possible sizes - a few dozen. This allows code like `hash = pre_hash % some_constant` per size. A good compiler can use optimized code rather than a general `% size` and well handle the speed issue you raised. – chux - Reinstate Monica Jan 06 '23 at 05:22
1

Any CRC will do the job. Just XOR (I have used 64bit numbers, but 32bits crc, but it should work also with full 64 xor/crc or 32bit xor/crc) the elements together (to eliminate any order between them, as the XOR operation is conmutative, you eliminate the dependency on the order) mod 2&31, then take a CRC32 of the result (that will spread the set of values uniformly, as it warrants ---or tries to--- that a change in one bit will affect half of the bits in the result) See here for sample code and several crc tables. The repository is BSD license, so you can use it as desired.

Below is a sample implementation that generates random lists, and reorders them, comparing their hashes:

crc32ieee8023.h

#ifndef CRC32IEEE8023_H
#define CRC32IEEE8023_H

#include "crc.h"

extern CRC_STATE crc32ieee8023[];

#endif /* CRC32IEEE8023_H */

crc.h

#ifndef CRC_H
#define CRC_H

#include <stdlib.h>
#include <stdint.h>

#define CRC_TABLE_SIZE  256
#define CRC_BYTE_SIZE   8
#define CRC_BYTE_MASK   0xff

typedef uint8_t         CRC_BYTE;
typedef uint64_t        CRC_STATE;

CRC_STATE do_crc(
    CRC_STATE  state,
    CRC_BYTE  *buff,
    size_t     nbytes,
    CRC_STATE *table);

#endif /* CRC_H */

test_xor_crc_hash.c

(This is the important file, where all the stuff is included.)

/* test_crc_table -- program to test a crc hash algorithm that
 * checks a list of numbers and generates the same crc in a form
 * that is independent on the list order presented.
 * Program generates a list of random numbers (32bit) then it
 * generates a random permutation of the list and a sorted list,
 * calculates the hash over the three lists, and compares them.
 */
#include <errno.h>
#include <fcntl.h>
#include <getopt.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>

#include "crc.h"
#include "crc32ieee8023.h"

#define DFLT_N          10
#define RANDOM_DEV      "/dev/urandom"

int long_compare(
        const void *_a,
        const void *_b);

void print(
        const char *name,
        const uint64_t *v,
        int         vsz,
        CRC_STATE   crc,
        uint64_t        xor);

int
main(int argc, char **argv)
{
    int opt;
    int n = DFLT_N,
        res;
    /* process options */
    while ((opt = getopt(argc, argv, "n:")) != EOF) {
        switch (opt) {
        case 'n': res = sscanf(optarg, "%u", &n);
                  if (res != 1) {
                      fprintf(stderr,
                              "%s: invalid format (-n)\n",
                              optarg);
                  }
                  break;
        } /* switch */
    } /* while */

    /* initialization of random number generator */
    unsigned short random_state[3];
    int fd = open(RANDOM_DEV, O_RDONLY);
    if (fd < 0) {
        fprintf(stderr,
                "open: %s: %s\n",
                RANDOM_DEV, strerror(errno));
        exit(EXIT_FAILURE);
    }
    res = read(fd, random_state, sizeof random_state);
    if (res < 0) { /* error */
        fprintf(stderr,
                "read: %s: %s\n",
                RANDOM_DEV, strerror(errno));
        exit(EXIT_FAILURE);
    }
    if (res < sizeof random_state) {
        fprintf(stderr,
                "read: %s: incomplete read (%d/%zd)\n",
                RANDOM_DEV, res, sizeof random_state);
        exit(EXIT_FAILURE);
    }
    seed48(random_state);
    close(fd);

    /* generate a list of random numbers and make two copies */
    uint64_t *original    = calloc(n, sizeof *original),
             *copy_sorted = calloc(n, sizeof *copy_sorted),
             *random_sort = calloc(n, sizeof *random_sort);

    /* make two copies */
    for (int i = 0; i < n; i++) {
        original[i] = copy_sorted[i]
                    = random_sort[i]
                    = (long)lrand48() | ((long)lrand48() << 32);
    }

    /* sort the numbers */
    qsort(copy_sorted, n, sizeof *copy_sorted, long_compare);

    /* and random permutation */
    for (int i = 0; i < n-1; i++) {
        int j = lrand48() % (n - i);
        if (i != j) {
            uint64_t temp  = random_sort[i];
            random_sort[i] = random_sort[j];
            random_sort[j] = temp;
        }
    }

    /* calculate the sorts */
    uint64_t xor_original = 0, xor_sorted = 0, xor_random = 0;
    for (int i = 0; i < n; i++) {
        xor_original ^= original[i];
        xor_sorted   ^= copy_sorted[i];
        xor_random   ^= random_sort[i];
    }

    /* now, calculate the crc's (a crc64 would be better for long) */
    CRC_STATE
        crc_original = do_crc(0xffffffff, (unsigned char *)&xor_original,
                sizeof xor_original, crc32ieee8023),
        crc_sorted   = do_crc(0xffffffff, (unsigned char *)&xor_sorted,
                sizeof xor_sorted,   crc32ieee8023),
        crc_random   = do_crc(0xffffffff, (unsigned char *)&xor_random,
                sizeof xor_random,   crc32ieee8023);
    print("original", original,    n, crc_original, xor_original);
    print("  sorted", copy_sorted, n, crc_sorted,   xor_sorted);
    print("  random", random_sort, n, crc_random,   xor_random);

    if (crc_original != crc_sorted || crc_sorted != crc_random) {
        fprintf(stderr, "crc's don't match (crc_original == 0x%08lx, "
                "crc_sorted == 0x%08lx, crc_random == 0x%08lx)\n",
                crc_original, crc_sorted, crc_random);
    }

    /* change only one bit in one element to see how it changes the hash */
    int bit_to_change        = lrand48()     % (n * 64),
        elem_to_change       = bit_to_change % n;

    bit_to_change            %= 64;
    original[elem_to_change] ^= (1UL << bit_to_change); /* change the bit */

    /* we should do the calculation over all elements, but just
     * changing a bit in one element will change just the same bit in the
     * xor_original accumulation variable */
    uint64_t xor_original_new = xor_original;
    xor_original_new         ^= (1UL << bit_to_change);

    printf("element=%d, bit=%d\n", elem_to_change, bit_to_change);
    uint64_t crc_original_new = do_crc(0xffffffff, (unsigned char *)&xor_original_new, sizeof xor_original_new, crc32ieee8023);
    print(" chg1bit", original, n, crc_original_new, xor_original_new);
    
}

int long_compare(const void *_a, const void *_b)
{
    const uint64_t *a = _a, *b = _b;
    return *a == *b
        ? 0
        : *a > *b
            ? +1
            : -1;
}

void print(const char *name, const uint64_t *v, int vsz, CRC_STATE crc, uint64_t xor)
{

    printf("%s: { ", name);
    char *sep = "";
    for (int i = 0; i < vsz; i++) {
        printf("%s0x%016lx", sep, v[i]);
        sep = ", ";
    }
    printf(" }\n"
          "    xor = 0x%016lx, crc = 0x%08lx\n",
          xor, crc);
}

crc.c

#include <sys/types.h>
#include "crc.h"

/* table based CRC calculation */
CRC_STATE do_crc(
    CRC_STATE  state,
    CRC_BYTE  *buff,
    size_t     nbytes,
    CRC_STATE *table)
{
    CRC_STATE index;

    while (nbytes--) {
        state  ^= *buff++;
        index   = state & CRC_BYTE_MASK;
        state >>= CRC_BYTE_SIZE;
        state  ^= table[index];
    } /* while */

    return state;
} /* do_crc */

crc32ieee8023.c

#include "crc.h"

/* variables */

CRC_STATE crc32ieee8023[] = {
    /* Comando usado: mkcrc -gpedb88320 */
    /* Polinomio: x^32+x^26+x^23+x^22+x^16+x^12+x^11+x^10+x^8+x^7+x^5+x^4+x^2+x+1 */
    /*   0 */ 0x0, 0x77073096, 0xee0e612c, 0x990951ba,
    /*   4 */ 0x76dc419, 0x706af48f, 0xe963a535, 0x9e6495a3,
    /*   8 */ 0xedb8832, 0x79dcb8a4, 0xe0d5e91e, 0x97d2d988,
    /*  12 */ 0x9b64c2b, 0x7eb17cbd, 0xe7b82d07, 0x90bf1d91,
    /*  16 */ 0x1db71064, 0x6ab020f2, 0xf3b97148, 0x84be41de,
    /*  20 */ 0x1adad47d, 0x6ddde4eb, 0xf4d4b551, 0x83d385c7,
    /*  24 */ 0x136c9856, 0x646ba8c0, 0xfd62f97a, 0x8a65c9ec,
    /*  28 */ 0x14015c4f, 0x63066cd9, 0xfa0f3d63, 0x8d080df5,
    /*  32 */ 0x3b6e20c8, 0x4c69105e, 0xd56041e4, 0xa2677172,
    /*  36 */ 0x3c03e4d1, 0x4b04d447, 0xd20d85fd, 0xa50ab56b,
    /*  40 */ 0x35b5a8fa, 0x42b2986c, 0xdbbbc9d6, 0xacbcf940,
    /*  44 */ 0x32d86ce3, 0x45df5c75, 0xdcd60dcf, 0xabd13d59,
    /*  48 */ 0x26d930ac, 0x51de003a, 0xc8d75180, 0xbfd06116,
    /*  52 */ 0x21b4f4b5, 0x56b3c423, 0xcfba9599, 0xb8bda50f,
    /*  56 */ 0x2802b89e, 0x5f058808, 0xc60cd9b2, 0xb10be924,
    /*  60 */ 0x2f6f7c87, 0x58684c11, 0xc1611dab, 0xb6662d3d,
    /*  64 */ 0x76dc4190, 0x1db7106, 0x98d220bc, 0xefd5102a,
    /*  68 */ 0x71b18589, 0x6b6b51f, 0x9fbfe4a5, 0xe8b8d433,
    /*  72 */ 0x7807c9a2, 0xf00f934, 0x9609a88e, 0xe10e9818,
    /*  76 */ 0x7f6a0dbb, 0x86d3d2d, 0x91646c97, 0xe6635c01,
    /*  80 */ 0x6b6b51f4, 0x1c6c6162, 0x856530d8, 0xf262004e,
    /*  84 */ 0x6c0695ed, 0x1b01a57b, 0x8208f4c1, 0xf50fc457,
    /*  88 */ 0x65b0d9c6, 0x12b7e950, 0x8bbeb8ea, 0xfcb9887c,
    /*  92 */ 0x62dd1ddf, 0x15da2d49, 0x8cd37cf3, 0xfbd44c65,
    /*  96 */ 0x4db26158, 0x3ab551ce, 0xa3bc0074, 0xd4bb30e2,
    /* 100 */ 0x4adfa541, 0x3dd895d7, 0xa4d1c46d, 0xd3d6f4fb,
    /* 104 */ 0x4369e96a, 0x346ed9fc, 0xad678846, 0xda60b8d0,
    /* 108 */ 0x44042d73, 0x33031de5, 0xaa0a4c5f, 0xdd0d7cc9,
    /* 112 */ 0x5005713c, 0x270241aa, 0xbe0b1010, 0xc90c2086,
    /* 116 */ 0x5768b525, 0x206f85b3, 0xb966d409, 0xce61e49f,
    /* 120 */ 0x5edef90e, 0x29d9c998, 0xb0d09822, 0xc7d7a8b4,
    /* 124 */ 0x59b33d17, 0x2eb40d81, 0xb7bd5c3b, 0xc0ba6cad,
    /* 128 */ 0xedb88320, 0x9abfb3b6, 0x3b6e20c, 0x74b1d29a,
    /* 132 */ 0xead54739, 0x9dd277af, 0x4db2615, 0x73dc1683,
    /* 136 */ 0xe3630b12, 0x94643b84, 0xd6d6a3e, 0x7a6a5aa8,
    /* 140 */ 0xe40ecf0b, 0x9309ff9d, 0xa00ae27, 0x7d079eb1,
    /* 144 */ 0xf00f9344, 0x8708a3d2, 0x1e01f268, 0x6906c2fe,
    /* 148 */ 0xf762575d, 0x806567cb, 0x196c3671, 0x6e6b06e7,
    /* 152 */ 0xfed41b76, 0x89d32be0, 0x10da7a5a, 0x67dd4acc,
    /* 156 */ 0xf9b9df6f, 0x8ebeeff9, 0x17b7be43, 0x60b08ed5,
    /* 160 */ 0xd6d6a3e8, 0xa1d1937e, 0x38d8c2c4, 0x4fdff252,
    /* 164 */ 0xd1bb67f1, 0xa6bc5767, 0x3fb506dd, 0x48b2364b,
    /* 168 */ 0xd80d2bda, 0xaf0a1b4c, 0x36034af6, 0x41047a60,
    /* 172 */ 0xdf60efc3, 0xa867df55, 0x316e8eef, 0x4669be79,
    /* 176 */ 0xcb61b38c, 0xbc66831a, 0x256fd2a0, 0x5268e236,
    /* 180 */ 0xcc0c7795, 0xbb0b4703, 0x220216b9, 0x5505262f,
    /* 184 */ 0xc5ba3bbe, 0xb2bd0b28, 0x2bb45a92, 0x5cb36a04,
    /* 188 */ 0xc2d7ffa7, 0xb5d0cf31, 0x2cd99e8b, 0x5bdeae1d,
    /* 192 */ 0x9b64c2b0, 0xec63f226, 0x756aa39c, 0x26d930a,
    /* 196 */ 0x9c0906a9, 0xeb0e363f, 0x72076785, 0x5005713,
    /* 200 */ 0x95bf4a82, 0xe2b87a14, 0x7bb12bae, 0xcb61b38,
    /* 204 */ 0x92d28e9b, 0xe5d5be0d, 0x7cdcefb7, 0xbdbdf21,
    /* 208 */ 0x86d3d2d4, 0xf1d4e242, 0x68ddb3f8, 0x1fda836e,
    /* 212 */ 0x81be16cd, 0xf6b9265b, 0x6fb077e1, 0x18b74777,
    /* 216 */ 0x88085ae6, 0xff0f6a70, 0x66063bca, 0x11010b5c,
    /* 220 */ 0x8f659eff, 0xf862ae69, 0x616bffd3, 0x166ccf45,
    /* 224 */ 0xa00ae278, 0xd70dd2ee, 0x4e048354, 0x3903b3c2,
    /* 228 */ 0xa7672661, 0xd06016f7, 0x4969474d, 0x3e6e77db,
    /* 232 */ 0xaed16a4a, 0xd9d65adc, 0x40df0b66, 0x37d83bf0,
    /* 236 */ 0xa9bcae53, 0xdebb9ec5, 0x47b2cf7f, 0x30b5ffe9,
    /* 240 */ 0xbdbdf21c, 0xcabac28a, 0x53b39330, 0x24b4a3a6,
    /* 244 */ 0xbad03605, 0xcdd70693, 0x54de5729, 0x23d967bf,
    /* 248 */ 0xb3667a2e, 0xc4614ab8, 0x5d681b02, 0x2a6f2b94,
    /* 252 */ 0xb40bbe37, 0xc30c8ea1, 0x5a05df1b, 0x2d02ef8d,
}; /* crc32ieee8023 */

Makefile

targets       = test_xch
toclean       = $(targets)

test_xch_deps =
test_xch_objs = crc32ieee8023.o crc.o test_xor_crc_hash.o
test_xch_libs =
test_xch_ldfl =
toclean      += $(test_xch_objs)

all: $(targets)
clean:
    $(RM) $(toclean)

test_xch: $(test_xch_deps) $(test_xch_objs)
    $(CC) $(LDFLAGS) $($@_ldfl) -o $@ $($@_objs) $($@_libs) $(LIBS)

To make the program, just run:

$ make

and to run it, you can use option -n that allows you to specify the number of random elements to generate.

Luis Colorado
  • 10,974
  • 1
  • 16
  • 31
-1

I think you will have to invent one to avoid the slow sorting option. In addition to XOR and arithmetic addition, there are bit rotations, and bit masks you could use. If you need high collision resistance, you could just combine more than one of the hash functions. e.g. Assuming the d_i and arithmetic are modular like with uint32_t for example,

H_1 = sum_{i = 1 to n} d_i
H_2 = xor_{i = 1 to n} d_i
H_3 = xor_{i = 1 to n} (rotl(d_i, d_i & 0x1f) + c)

Then take H1H2H3 as a 12 byte hash.

Simon Goater
  • 759
  • 1
  • 1
  • 7
  • 1
    Note 1) `rotl(d_i, d_i & 0x1f)` generate unbalanced results. A 4 bit example makes values (sorted) { 0, 2, 3, 4, 8, 8, 9, 9, A, A, B, B, B, C, D, F }. Of course `+`, `*` have their share of non-uniform distribution too. 2) On simple embedded processors, still quite common these days, `rotl(x, n)` may cost `n` cycles. As usual, any real performance issues deserve profiling. – chux - Reinstate Monica Dec 20 '22 at 16:51
  • If you were rotating 4 bits the equivalent would be rotl(n, n & 3). The 0x1f is for 32 bit, and you would use 0x3f for 64 bit. You appear to have misunderstood the purpose of the 0x1f. – Simon Goater Dec 20 '22 at 17:15
  • I used `for (unsigned i = 0; i < 16; i++) { unsigned y = rotl4(i, i & 3u); y &= 0xF; printf("%X %X\n", i, y); }` and `unsigned rotl4(unsigned x, unsigned shift) { x &= 0xF; shift &= 3u; static const unsigned msb = 8u; while (shift-- > 0) { if (x & msb) { x = (x << 1) | 1; } else { x = (x << 1); } x &= 0xF; } return x & 0xF; }` and that formed `0 0 1 2 2 8 3 9 4 4 5 A 6 9 7 B 8 8 9 3 A A B D C C D B E B F F`. Do you receive the same? – chux - Reinstate Monica Dec 20 '22 at 17:19
  • Use return ((x << shift) & 0xf) | (x >> (4 - shift)). – Simon Goater Dec 20 '22 at 17:24
  • Same biased result. Did you try [code](https://stackoverflow.com/questions/74864695/what-is-the-safest-way-to-hash-a-list-of-non-repeating-integers-without-taking-o/74865064?noredirect=1#comment132122896_74865266) with your fix? – chux - Reinstate Monica Dec 20 '22 at 17:34
  • I think your code is faulty. From printf("%i ", rotl4(i, i & 3)); I get 0 2 8 9 4 10 9 11 8 3 10 13 12 11 11 15. This function was only an example I picked out of thin air. What are you trying to prove? – Simon Goater Dec 20 '22 at 17:36
  • Code is not faulty. That is the same (hex output vs decimal). So you are getting unbalanced return values too. "What are you trying to prove" --> "rotl(d_i, d_i & 0x1f) generate unbalanced results.". This all helps support "Hashing takes time for good design". – chux - Reinstate Monica Dec 20 '22 at 17:39
  • By 'unbalanced', do you mean it isn't a permutation. I never thought it would be. It doesn't have to be. The bitwise operations in most hash functions like md5, sha256 etc are not all permutations. It is the same as your output, I didn't see how you were formatting it. – Simon Goater Dec 20 '22 at 17:50
  • 'Hashing takes time for good design' was never in question. I said 'you' would have to invent one. – Simon Goater Dec 20 '22 at 17:52