0

To pack a 40 byte SHA in 20 bytes, we are doing this:

(defn pack-sha-1 [sha-1] 
  (->> sha-1
       (partition 2)
       (map (partial apply str)) ;; To convert back to list of strings
       (map (fn [hex] (-> hex
                         (Integer/parseInt 16)
                         char)))
       (apply str)))             ;; To convert back to string

First we partition by 2 and then convert that into a single character.

This is essentially the packing step where 2 hex characters are converted into a single character.

When we try to convert the packed string back to its 40 bytes SHA (using https://www.rapidtables.com/convert/number/ascii-to-hex.html), it does not give the same SHA back in all the cases.

It works for: "5e4fb7a0afe4f2ec9768a9ddd2c476dab7fd449b"

But it does not work for: "00c35422185bf1dca594f699084525e8d0b8569f"

Whenever there is a pair of hex in the range ("08" - "0d"), it does not work.

What is going wrong here?

This is done as part of implementing James Coglan's book "Building Git" in Clojure.

Thanks for your help!

Suvrat Apte
  • 161
  • 10
  • Why do you need a string back? Couldn't you use a byte array so you don't have to battle against character encodings and such things? https://stackoverflow.com/questions/140131/convert-a-string-representation-of-a-hex-dump-to-a-byte-array-using-java – cfrick Mar 20 '22 at 12:06
  • @cfrick Yes, that's what I did in the end. But I was anyway curious as to what is going wrong here. – Suvrat Apte Mar 21 '22 at 13:23

2 Answers2

1

In general, it is not possible to pack 40 bytes into 20 bytes in a reversible manner. You will lose information in the packing. I have not read the "Building Git" book, so not quite sure what exactly they are trying to do. Perhaps you can edit your question to clarify.

A SHA-1 is always 20 bytes. In order to print those 20 bytes, you can display them using 40 hex characters, where each byte is displayed as 2 hex characters. That is not 40 bytes, but 40 characters for the display.

There can be other ways to display the 20 bytes. For example, you could consider each byte as an unsigned integer between 000 and 255 (1-3 characters) and a space between them. In that case, your first example SHA would be displayed as "0 195 84 34 24 91 241 220 165 148 246 153 8 69 37 232 208 184 86 159".

The byte that displays as 08 in hex is the backspace character. I tried typing a backspace into the ASCII-to-Hex converter you provided and was not able to. So, I would not trust that converter to work correctly for all characters.

dorab
  • 807
  • 5
  • 13
0

Actually, I think it works - you may have trouble with copy/pasting the string that the pack-sha-1 is outputting.

If you do: (def packed-sha-1 (pack-sha-1 "00c35422185bf1dca594f699084525e8d0b8569f"))

and then using the answer from here: Clojure's equivalent to python's encode hex and decodehex you convert it back:

(hexify (map short (seq packed-sha-1)))

you get back the "00c35422185bf1dca594f699084525e8d0b8569f" string.

(=  "00c35422185bf1dca594f699084525e8d0b8569f" (hexify (map short (seq packed-sha-1 ))))
true
tudor
  • 151
  • 2
  • 8