0

As per my knowledge, Git uses its BLOB objects to store the content of a file in binary format. So where does it store the file format? Is it stored in the tree object? Suppose I have 2 files, file1.docx and file2.png and I have committed these files. So git will have the binary content of file1.docx in a blob object and similarly another blob object will contain the content of file2.png. But where would the file format of these two files will be stored because when I take the pull of repository, now file system would require the file format.

Also if the file is text file, would it also store its character-encoding somewhere?

AlwaysLearning
  • 133
  • 1
  • 8
  • What do you mean by *file format* here? Git stores bytes: there is no format, a file is just bytes. It's true that Git's `git diff` strongly prefers files that consist of *lines* (as `git diff` is pretty useless with things that aren't lines), but that just means that non-line-based files don't diff properly. If some file system requires "file formats", that file system is not suitable for use with Git, because Git does not store such a thing. – torek Apr 23 '20 at 04:44
  • @torek:Thanks for that. I got about png file. But as per my understanding suppose I have a file **_abc.txt_** then the content of that file is stored in file system using the encoding we select to save a file i.e. diff. bytes will be generated for the same character in diff. encoding. And if I open the file in editor using some diff. encoding then I may see some replacement characters(i.e. ?). So, my point is how content of text will be stored in byte format in git blob? There must be some default encoding for characters in git blob. – AlwaysLearning Apr 24 '20 at 01:14
  • All modern file systems (i.e., not stuff like VMS from the 1970s or IBM OSes from the 1960s) store files as bytes. If a file has an encoding, that's just because the bytes are arranged in that encoding. Some Windows tools store files in UTF-16 instead of UTF-8, and when they do that, they store a special Byte Order Marker as the first two bytes. But that's still just two bytes. – torek Apr 24 '20 at 02:28
  • So, if your OS does funny things with encoding, it can: store a *second* file, in a constant encoding, that tells you how to *interpret* the bytes in the first file; use the file's extension (.jpg vs .txt vs whatever) to indicate the encoding; store a "magic cookie" in the first few bytes to indicate an encoding; or something else, such as: just guess at the encoding. But the *file* is just bytes. – torek Apr 24 '20 at 02:30

2 Answers2

1

Please take a look at how git objects are stored for commits. You can see that each commit hash points to the tree object which in turn points to hash of blobs(files) and other tree's(folders). You could see that name and format of files are stored in trees, file blob itself doesn't has any name, it just has a blob of content.

Git objects for each commit Source: Google

Answering to the second question, git doesn't think about character encoding, it just converts the content into its binary format. The operating file system will handle the encoding, when the files are updated in working area.

Hope it was clear enough. Thanks

Community
  • 1
  • 1
Abdul Pathan
  • 345
  • 3
  • 12
  • thanks, as per my understanding suppose I have a file _**abc.txt**_ then the content of that file is stored in file system using the encoding we select to save a file i.e. diff. bytes will be generated for the same character in diff. encoding. And if I open the file in editor using some diff. encoding then I may see some replacement characters(i.e. ?). So, my point is how content of text will be stored in byte format in git blob? There must be some default encoding for characters in git blob. – AlwaysLearning Apr 24 '20 at 01:19
0

When you take the pull (meaning when you checkout a repository you have cloned or pulled), Git itself doesn't need to know the "file format" of any blob it stores.

It will unpack/uncompress files from a commit, and restore them byte for byte.

VonC
  • 1,262,500
  • 529
  • 4,410
  • 5,250