4

Git-SHAs are computed by prefixing "blob $DecimalMessageLength\0" to a message and then SHA1-checksumming the prefixed message.

From the properties of the SHA1 algorithm, is it possible to do this in a streaming fashion, i.e., prepend the prefix after the message body has been hashed?

C example below (link with -lcrypto with libssl-dev installed; it's probably not very useful since this one doesn't even expose the SHA1 algorithm but I was playing...):

#include <openssl/sha.h>
#include <stdio.h>
#include <stdlib.h>

int pr_dgst(unsigned char const Dgst[static SHA_DIGEST_LENGTH])
{
    char const digits[]="0123456789abcdef";
    char digest_pr[(SHA_DIGEST_LENGTH)*2+1];
    for(size_t i=0;i<SHA_DIGEST_LENGTH;i++){
        digest_pr[i*2+0]=digits[Dgst[i]/16];
        digest_pr[i*2+1]=digits[Dgst[i]%16];
    }
    digest_pr[(SHA_DIGEST_LENGTH)*2]='\0';
    return puts(digest_pr);
}

int main()
{
    system("echo gitsha; printf '%s' 'abc' | git hash-object --stdin");
    #define STR_STRLEN(A) A, (sizeof(A)/sizeof(*(A))-1) //paste string literal and its length

    unsigned char digest[SHA_DIGEST_LENGTH];
    SHA_CTX ctx;
    SHA1_Init(&ctx); SHA1_Update(&ctx,STR_STRLEN("blob 3\0abc")); SHA1_Final(digest,&ctx);
    pr_dgst(digest); //prints the same as the system command

    //do this in a streaming fashion??
    SHA1_Init(&ctx); 
    size_t len = 0;
    SHA1_Update(&ctx,STR_STRLEN("a")); len++;
    SHA1_Update(&ctx,STR_STRLEN("b")); len++;
    SHA1_Update(&ctx,STR_STRLEN("c")); len++;
    //"prepend" "blob 3\0" now?
    SHA1_Final(digest,&ctx);
    /*pr_dgst(digest);*/

}
Petr Skocik
  • 58,047
  • 6
  • 95
  • 142
  • If the answer's a negative, then git SHAs are like a pretty retarded way to arbitrarily mess up a rather nice property of the SHA1 algorithm for no tangible benefit. Just saying. – Petr Skocik Nov 08 '18 at 13:28

1 Answers1

4

It is only possible to add bytes to the end of the message stream - otherwise the hash function would be cryptographically broken.

One of the upsides of having a prefix for 2 files is that you can store 2 files with a known bare SHA-1 collision into the same repository and they would get different blob IDs!