0

Trying to replicate hash-object I found that is not working when using non-ascii characters.

$ printf hola | git hash-object -w --stdin
b8b4a4e2a5db3ebed5f5e02beb3e2d27bca9fc9a
$ printf "blob 4\0hola" | shasum
b8b4a4e2a5db3ebed5f5e02beb3e2d27bca9fc9a

But if I add the Pound symbol

$ printf hola£ | git hash-object -w --stdin
8f9852933655612593d0bbd43c9f7c6f25d947a0
$ printf "blob 5\0hola£" | shasum
54386ef126fcfc9e8242c6d6bade401b1f27999a

Any idea why is this happening?

Related: How to assign a Git SHA1's to a file without Git?

Enzo
  • 4,111
  • 4
  • 21
  • 33

1 Answers1

4

The number defines number of bytes, not characters

£ in UTF-8 is 2-byte wide (0xc2 0xa3), thus:

printf "blob 6\0hola£" | shasum is what you want

which returns 8f9852933655612593d0bbd43c9f7c6f25d947a0 as expected.

Check yourself:

Unpack the content of the just written object:

printf "\x1f\x8b\x08\x00\x00\x00\x00\x00" | cat - .git/objects/8f/9852933655612593d0bbd43c9f7c6f25d947a0 | gzip -dc | xxd

(or use pigz: Deflate command line tool)

petrpulc
  • 940
  • 6
  • 22