After creating a repository containing some binary files (yes git
indeed doesn't handle binary files that well, but this is a repository where the binaries are mandatory files), performing a commit becomes kind of bloated.
When one performs a commit
the memory usage of git
reaches 2.7 GiB
. Sometimes the process is even killed by the operating system because it uses all remaining system resources.
This is probably due to the internally used diff
algorithm that requires to take both the original and the new file into account and needs to take at least one of the files into memory (the second can be handled as a stream).
Is it possible to mark a file as binary and specify that the repository doesn't need to calculate the difference, but only check for a new version (this can be done by handling both files as streams, thus in constant memory). After all, the storing the difference is probably as inefficient as copying the new version.
git
repositories are maintained on the machine automatically. It would thus be nice, if the process could be automated and thus use for instance the MIME-type of the files and mark all binary files automatically.