In the example I looked at the "content" contained data that could not be decoded as UTF-8.
Here is the test code I used:
from pathlib import Path
import zlib
git_file = Path.home().joinpath(
"bluez", ".git", "objects", "c9",
"4fdc6335829ab797dd06a6f0ac3fd123dd55a8")
data = zlib.decompress(git_file.read_bytes())
print(f"Raw {data}")
print(f"Raw as hex: {data.hex(' ')}")
# Decode on all data gives
# UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa7 in position 27: invalid start byte
# data.decode()
header, content = data.split(b'\0', 1)
print(f"Header: {header} or as UTF8: {header.decode('UTF8')}")
print(f"Content decode replace errors:\n {content.decode('UTF8', errors='replace')}")
And here is the output it gave:
Raw b'tree 1854\x00100755 agent.py\x00W\xa7A\x83\xdf%\x96\x1f\xef\xac\xbf\xd4X\x11\xa45\xf2&e\xcb100644 bluezutils.py\x00 D\xe43)\x16\xccfy\xdf\xa4+\xe5\xae\xf7\x06\x8b\xd2\x18+100644 dbusdef.py\x00\xd3\x17\xc1\x8d\xe2\x82\xdd\x81\xdc\xef\x82\xa3\xac\x10\x08.\xbfV\x8e\xf2100644 example-adv-monitor\x00\xa4\x05\xfc{\x0e\x11\xfai\xa33\xcdn^\xce\xc7o\x14\xcfkO100755 example-advertisement\x00_\x02.\xe6v\x97\x0f\xf0\xac\xeb)\xc5\x85L\x8d`O\xc7\x97\x03100755 example-battery-provider\x00\x15"\xa5\xe0u\xca/\x04A\xcfu,U:]K\x10\xd8\xf7\xf4100644 example-endpoint\x00\x16e\x1ch:\x7f\xf2\xef\xba6\n\xff\x1b\xdd\x15\xda\xde\x85A\x9d100755 example-gatt-client\x00^k\xef\x9d{\x92\xb3\xb9\xc1\xe3\x8d7\xb2B\x1b\x13s\xaf\xd2\x9b100755 example-gatt-server\x00w#\x1c:\xd1\x02\xa2[AQ\x7f\x9bA|7\x1b{\xa04\xb0100644 example-player\x00\x14\x97\xd1\x10z\x16\x81H5/0\x08\xfc\t\xa1\x10\xd6\xba\x0c\x0b100755 exchange-business-cards\x00\x9a:\xa2\x9f\xb4v&\xa5\x156-B\xb2\x0cAB\x067C\x16100755 ftp-client\x00\xefuj\xb2\xb3\r\x92s*\x92\xeb(B\x8c\x02\xb7L\xaa\xfbc100755 get-managed-objects\x00Q%\xeeRG\xd8\x87\xc7\xb3\ntaQ\xafS\xee\xc1={\x95100755 get-obex-capabilities\x00\xa7\x98\nD%\x95i\xd4 \x10\xfc\x86aQ\xdc4\x02\xcbG\xe5100755 list-devices\x00\xb1\x12Ul0\xb2\n\xb6\xa6R7\x93\xda\x84\xda%]\xa87\xb2100755 list-folders\x00\xb4\xe3\xf1\x00\xb0\x96&\xda\x7f\xa5{\xbb\x1dY\xadn4l\xe3c100755 map-client\x00\xa2\xd9j\xe5\xf0\xea4\xf0\x16-\xe6Wk\xa6\x9f\xffm\xff"\xa8100755 monitor-bluetooth\x00\xa3\x97~ n\xec\xce\xd9\x1c\x1d\xa5\xafZHK\x1a\xcd\xda\xc0\x91100755 opp-client\x00O\x00\xa4\x1c\x01)\xea?\x14HD\x0e\xf1\xb1\xbf\r\xf6\xbe,\xc6100755 pbap-client\x00\xe6\xca\xfd\xd3\x01B!^\xf6\x16\xad?\xde\xfbPO\xe3\x03)0100644 sap_client.py\x00\xfe\xd1:\xed\xc8@\x16\x91Y\xf6\xfb\xab\x18d\xdfa[\xa2\x00\xb4100644 service-did.xml\x00R\xebh\xc0 \xabem\xf0f\xb6@E)\xff7u\xe4\x9dc100644 service-ftp.xml\x00\x1b\xda\x88W\xf5\xa8\x8e\x99p]\n\x05\xe7\xac\xb8\xd3\xb95L)100644 service-opp.xml\x005\x1bJA\n\xdf\x97s\xc4\xe7\xcb\xc4\x81\xb1\xc3Y\xd9\xf3\nF100644 service-record.dtd\x00\xf5;\xe5\xd0R\xd2Un\xc4\x07)\x1a\xf1\xc3vd+#\xaa;100644 service-spp.xml\x00+\x15l?\x03\x81\x88]\\W\xda\xad\x91\x81A\xcf\xf2k&"100755 simple-agent\x00O\xda\xff\x1e\xb7e\xa4\x96k\xa5\r0\xc2is>D\x06!\x8c100755 simple-endpoint\x00Y\xca\x18\x9c\xe5\x0eF\xd0\xfc?\xfb\xad\x0bKm\xe1\xf6\xa5B2100755 simple-obex-agent\x00\x06Om0\xb9\xeb\xb2\x84P\x91\xae\xc9\xcc\xbe\xad\xc2!\x98\x08\xc7100755 simple-player\x00\x92h(D\xd0\xf4\xef\\\xef\xef__6\xbe\x8b\x8aq\xd1\x8dF100755 test-adapter\x00\x961\xd9O\xe3\x11\xb4\xbb\xc3\xf7\x07&\xfb\x1e\xf5\xf8\xcd3\xfa~100755 test-device\x00\xa1\xe5\x08\x16gO\xda6\xf4\x82M\x8d\x88\xfb\x89\x82\x04iZ\xe1100755 test-discovery\x00\xec\xcc|~1\xf0p\xb4\x91\xe7\xd1\xa0\xe0u\x9e\xc2\x9f\xc0 \'100755 test-gatt-profile\x00\xa9s\xae\x14\xed1\x81wV\xe7\x0b\xd4[\x9c;\xaaKN\xa90100755 test-health\x00\xd6\xb47\xed\x88\xc5/\xc6(\xbd\x08\x14(\x9b<\x1d]v\xaf\x1a100755 test-health-sink\x00Wf]+\xa6I\xaf\xa1\x1b%\xeed\x1c\x0cI\xa7\x868\xa1b100755 test-hfp\x00\x11\xe3(\xe5L\xc8h"\x04UQ\xce8)\xa7\xc3\xc7\xa4-\xf6100644 test-join\x00\x96\x97\x95\tG@6\x8bD\xcd\xda6\r\xa44[\x8f\xd9\x9b\xde100755 test-manager\x00?\xa7 Z\x04\xb6\xa1\xdc\xd40\xab\xd1\xfd\xbc\xab!\xdd\x8bN\x96100755 test-mesh\x00\xfb\xf2Gk\xfd6\x15\x8f\xf2\xec\xb8\xd5\xee\xc2\xe1oYgR\x02100755 test-nap\x00\xd5\xc7W\xb7\x9d\xe1\x1e\xc7s0\xcb\xf8\x1d\xe8\x07\xaf\xae\x11.E100755 test-network\x00\xac\xc7\xdf\xf6^HVt\xf0\x085\xd6\x93\x84\x88@&\xe7\xb0k100755 test-profile\x00\xaf\x1e#\xf7e\xdd\xef8\x16\xe6(\xb4\x06\xaa\x91\x05\x93\xde\xc3\xed100755 test-sap-server\x00\xdd\xb1\xef\xe9\xbc\x8c\xb6\x84\xc1>\xa0VO&\x10\x11\xc7\xb3-\x86'
Raw as hex: 74 72 65 65 20 31 38 35 34 00 31 30 30 37 35 35 20 61 67 65 6e 74 2e 70 79 00 57 a7 41 83 df 25 96 1f ef ac bf d4 58 11 a4 35 f2 26 65 cb 31 30 30 36 34 34 20 62 6c 75 65 7a 75 74 69 6c 73 2e 70 79 00 20 44 e4 33 29 16 cc 66 79 df a4 2b e5 ae f7 06 8b d2 18 2b 31 30 30 36 34 34 20 64 62 75 73 64 65 66 2e 70 79 00 d3 17 c1 8d e2 82 dd 81 dc ef 82 a3 ac 10 08 2e bf 56 8e f2 31 30 30 36 34 34 20 65 78 61 6d 70 6c 65 2d 61 64 76 2d 6d 6f 6e 69 74 6f 72 00 a4 05 fc 7b 0e 11 fa 69 a3 33 cd 6e 5e ce c7 6f 14 cf 6b 4f 31 30 30 37 35 35 20 65 78 61 6d 70 6c 65 2d 61 64 76 65 72 74 69 73 65 6d 65 6e 74 00 5f 02 2e e6 76 97 0f f0 ac eb 29 c5 85 4c 8d 60 4f c7 97 03 31 30 30 37 35 35 20 65 78 61 6d 70 6c 65 2d 62 61 74 74 65 72 79 2d 70 72 6f 76 69 64 65 72 00 15 22 a5 e0 75 ca 2f 04 41 cf 75 2c 55 3a 5d 4b 10 d8 f7 f4 31 30 30 36 34 34 20 65 78 61 6d 70 6c 65 2d 65 6e 64 70 6f 69 6e 74 00 16 65 1c 68 3a 7f f2 ef ba 36 0a ff 1b dd 15 da de 85 41 9d 31 30 30 37 35 35 20 65 78 61 6d 70 6c 65 2d 67 61 74 74 2d 63 6c 69 65 6e 74 00 5e 6b ef 9d 7b 92 b3 b9 c1 e3 8d 37 b2 42 1b 13 73 af d2 9b 31 30 30 37 35 35 20 65 78 61 6d 70 6c 65 2d 67 61 74 74 2d 73 65 72 76 65 72 00 77 23 1c 3a d1 02 a2 5b 41 51 7f 9b 41 7c 37 1b 7b a0 34 b0 31 30 30 36 34 34 20 65 78 61 6d 70 6c 65 2d 70 6c 61 79 65 72 00 14 97 d1 10 7a 16 81 48 35 2f 30 08 fc 09 a1 10 d6 ba 0c 0b 31 30 30 37 35 35 20 65 78 63 68 61 6e 67 65 2d 62 75 73 69 6e 65 73 73 2d 63 61 72 64 73 00 9a 3a a2 9f b4 76 26 a5 15 36 2d 42 b2 0c 41 42 06 37 43 16 31 30 30 37 35 35 20 66 74 70 2d 63 6c 69 65 6e 74 00 ef 75 6a b2 b3 0d 92 73 2a 92 eb 28 42 8c 02 b7 4c aa fb 63 31 30 30 37 35 35 20 67 65 74 2d 6d 61 6e 61 67 65 64 2d 6f 62 6a 65 63 74 73 00 51 25 ee 52 47 d8 87 c7 b3 0a 74 61 51 af 53 ee c1 3d 7b 95 31 30 30 37 35 35 20 67 65 74 2d 6f 62 65 78 2d 63 61 70 61 62 69 6c 69 74 69 65 73 00 a7 98 0a 44 25 95 69 d4 20 10 fc 86 61 51 dc 34 02 cb 47 e5 31 30 30 37 35 35 20 6c 69 73 74 2d 64 65 76 69 63 65 73 00 b1 12 55 6c 30 b2 0a b6 a6 52 37 93 da 84 da 25 5d a8 37 b2 31 30 30 37 35 35 20 6c 69 73 74 2d 66 6f 6c 64 65 72 73 00 b4 e3 f1 00 b0 96 26 da 7f a5 7b bb 1d 59 ad 6e 34 6c e3 63 31 30 30 37 35 35 20 6d 61 70 2d 63 6c 69 65 6e 74 00 a2 d9 6a e5 f0 ea 34 f0 16 2d e6 57 6b a6 9f ff 6d ff 22 a8 31 30 30 37 35 35 20 6d 6f 6e 69 74 6f 72 2d 62 6c 75 65 74 6f 6f 74 68 00 a3 97 7e 20 6e ec ce d9 1c 1d a5 af 5a 48 4b 1a cd da c0 91 31 30 30 37 35 35 20 6f 70 70 2d 63 6c 69 65 6e 74 00 4f 00 a4 1c 01 29 ea 3f 14 48 44 0e f1 b1 bf 0d f6 be 2c c6 31 30 30 37 35 35 20 70 62 61 70 2d 63 6c 69 65 6e 74 00 e6 ca fd d3 01 42 21 5e f6 16 ad 3f de fb 50 4f e3 03 29 30 31 30 30 36 34 34 20 73 61 70 5f 63 6c 69 65 6e 74 2e 70 79 00 fe d1 3a ed c8 40 16 91 59 f6 fb ab 18 64 df 61 5b a2 00 b4 31 30 30 36 34 34 20 73 65 72 76 69 63 65 2d 64 69 64 2e 78 6d 6c 00 52 eb 68 c0 20 ab 65 6d f0 66 b6 40 45 29 ff 37 75 e4 9d 63 31 30 30 36 34 34 20 73 65 72 76 69 63 65 2d 66 74 70 2e 78 6d 6c 00 1b da 88 57 f5 a8 8e 99 70 5d 0a 05 e7 ac b8 d3 b9 35 4c 29 31 30 30 36 34 34 20 73 65 72 76 69 63 65 2d 6f 70 70 2e 78 6d 6c 00 35 1b 4a 41 0a df 97 73 c4 e7 cb c4 81 b1 c3 59 d9 f3 0a 46 31 30 30 36 34 34 20 73 65 72 76 69 63 65 2d 72 65 63 6f 72 64 2e 64 74 64 00 f5 3b e5 d0 52 d2 55 6e c4 07 29 1a f1 c3 76 64 2b 23 aa 3b 31 30 30 36 34 34 20 73 65 72 76 69 63 65 2d 73 70 70 2e 78 6d 6c 00 2b 15 6c 3f 03 81 88 5d 5c 57 da ad 91 81 41 cf f2 6b 26 22 31 30 30 37 35 35 20 73 69 6d 70 6c 65 2d 61 67 65 6e 74 00 4f da ff 1e b7 65 a4 96 6b a5 0d 30 c2 69 73 3e 44 06 21 8c 31 30 30 37 35 35 20 73 69 6d 70 6c 65 2d 65 6e 64 70 6f 69 6e 74 00 59 ca 18 9c e5 0e 46 d0 fc 3f fb ad 0b 4b 6d e1 f6 a5 42 32 31 30 30 37 35 35 20 73 69 6d 70 6c 65 2d 6f 62 65 78 2d 61 67 65 6e 74 00 06 4f 6d 30 b9 eb b2 84 50 91 ae c9 cc be ad c2 21 98 08 c7 31 30 30 37 35 35 20 73 69 6d 70 6c 65 2d 70 6c 61 79 65 72 00 92 68 28 44 d0 f4 ef 5c ef ef 5f 5f 36 be 8b 8a 71 d1 8d 46 31 30 30 37 35 35 20 74 65 73 74 2d 61 64 61 70 74 65 72 00 96 31 d9 4f e3 11 b4 bb c3 f7 07 26 fb 1e f5 f8 cd 33 fa 7e 31 30 30 37 35 35 20 74 65 73 74 2d 64 65 76 69 63 65 00 a1 e5 08 16 67 4f da 36 f4 82 4d 8d 88 fb 89 82 04 69 5a e1 31 30 30 37 35 35 20 74 65 73 74 2d 64 69 73 63 6f 76 65 72 79 00 ec cc 7c 7e 31 f0 70 b4 91 e7 d1 a0 e0 75 9e c2 9f c0 20 27 31 30 30 37 35 35 20 74 65 73 74 2d 67 61 74 74 2d 70 72 6f 66 69 6c 65 00 a9 73 ae 14 ed 31 81 77 56 e7 0b d4 5b 9c 3b aa 4b 4e a9 30 31 30 30 37 35 35 20 74 65 73 74 2d 68 65 61 6c 74 68 00 d6 b4 37 ed 88 c5 2f c6 28 bd 08 14 28 9b 3c 1d 5d 76 af 1a 31 30 30 37 35 35 20 74 65 73 74 2d 68 65 61 6c 74 68 2d 73 69 6e 6b 00 57 66 5d 2b a6 49 af a1 1b 25 ee 64 1c 0c 49 a7 86 38 a1 62 31 30 30 37 35 35 20 74 65 73 74 2d 68 66 70 00 11 e3 28 e5 4c c8 68 22 04 55 51 ce 38 29 a7 c3 c7 a4 2d f6 31 30 30 36 34 34 20 74 65 73 74 2d 6a 6f 69 6e 00 96 97 95 09 47 40 36 8b 44 cd da 36 0d a4 34 5b 8f d9 9b de 31 30 30 37 35 35 20 74 65 73 74 2d 6d 61 6e 61 67 65 72 00 3f a7 20 5a 04 b6 a1 dc d4 30 ab d1 fd bc ab 21 dd 8b 4e 96 31 30 30 37 35 35 20 74 65 73 74 2d 6d 65 73 68 00 fb f2 47 6b fd 36 15 8f f2 ec b8 d5 ee c2 e1 6f 59 67 52 02 31 30 30 37 35 35 20 74 65 73 74 2d 6e 61 70 00 d5 c7 57 b7 9d e1 1e c7 73 30 cb f8 1d e8 07 af ae 11 2e 45 31 30 30 37 35 35 20 74 65 73 74 2d 6e 65 74 77 6f 72 6b 00 ac c7 df f6 5e 48 56 74 f0 08 35 d6 93 84 88 40 26 e7 b0 6b 31 30 30 37 35 35 20 74 65 73 74 2d 70 72 6f 66 69 6c 65 00 af 1e 23 f7 65 dd ef 38 16 e6 28 b4 06 aa 91 05 93 de c3 ed 31 30 30 37 35 35 20 74 65 73 74 2d 73 61 70 2d 73 65 72 76 65 72 00 dd b1 ef e9 bc 8c b6 84 c1 3e a0 56 4f 26 10 11 c7 b3 2d 86
Header: b'tree 1854' or as UTF8: tree 1854
Content decode replace errors:
100755 agent.py W�A��%��X�5�&e�100644 bluezutils.py D�3)�fyߤ+����+100644 dbusdef.py ����݁��.�V��100644 example-adv-monitor ��{�i�3�n^��o�kO100755 example-advertisement _.�v���)ŅL�`OǗ100755 example-battery-provider "��u�/A�u,U:]K���100644 example-endpoint eh:��6
�s*��(B��L��c100755 get-managed-objects Q%�RG؇dz
taQ�S��={�100755 get-obex-capabilities ��
D%�i� ��aQ�4�G�100755 list-devices �Ul0�
��,�100755 pbap-client ����B!^��?��PO�)0100644 sap_client.py ��:��@�Y���d�a[� �100644 service-did.xml R�h� �em�f�@E)�7u�c100644 service-ftp.xml ڈW����p]
笸ӹ5L)100644 service-opp.xml 5JA
ߗs���ā��Y��
�4[�ٛ�100755 test-manager ?� Z����0�����!N�100755 test-mesh ��Gk�6�������oYgR100755 test-nap ��W����s0�����.E100755 test-network ����^HVt5֓��@&�k100755 test-profile �#�e��8�(�������100755 test-sap-server ݱ�鼌���>�VO&dz-�
As you can see the "content" section has information that could be converted to UTF-8 and some that cannot be represented as printable characters.
What often causes confusion when using python is that when displaying bytes Python will display any values in the ASCII range that are printable as their string equivalent.
For example, the following hex values b'\x42\x61\x64'
>>> print(b'\x42\x61\x64')
b'Bad'
This is an artefact of Python displaying the values, the content is still the same hex value. With Python when debugging binary data it is often worth printing it as a hex string:
>>> print(b'\x42\x61\x64'.hex(' '))
42 61 64
In the example above there are some values in the character range so it appears there are strings. For example, if we append \xff
it is printed as the hex escaped hex value still as it cannot be represented by a character:
print(b'\x42\x61\x64\xff')
b'Bad\xff'
There is a good guide on Unicode & Character Encodings at:
https://realpython.com/python-encodings-guide/