Why did Git push lots of data after ONLY renaming files?

Question

I have a Git repo being used as some sort of library, containing very different file formats: Images like JPEG+EPS, PDFs, textual files using AsciiDoc and Markdown etc. Needed to apply some renaming of many directories and files, WITHOUT any changes to the content, really only new names/paths. Looking at Git’s history for that commit, it properly detects all individual renames of files and shows a status of Rename in TortoiseGit.

Because of renames only, I expected the corresponding PUSH to be fast, with only new commit and tree objects being transferred. Though, the PUSH was pretty slow instead and seems to have sent all contents of all files again:

git.exe push --progress "origin" main:main
Enumerating objects: 207, done.
Counting objects: 100% (207/207), done.
Delta compression using up to 8 threads
Compressing objects: 100% (204/204), done.
Writing objects: 100% (206/206), 180.27 MiB | 228.00 KiB/s, done.
Total 206 (delta 9), reused 0 (delta 0), pack-reused 0
remote: Resolving deltas: 100% (9/9), done.
To https://[...]/library.git
7dceaa6..c4e2e3d  main -> main

Success (815484 ms @ 11.08.2022 09:32:10)

From my understanding of Git's object model, the file contents themself are their own BLOB objects and therefore didn't change in the commit. The paths and names of those files are TREE objects which did change and obviously a new COMMIT object needed to be created.

So, at least after DELTA compression, shouldn't the client have known that the BLOB objects of the unchanged files are available at the server already and not transfer those again?

I've found a similar question and interesting answer, though in that question some files did change. Additionally, the answer mentioning additional commits by other Git clients doesn't apply to me: I was the only user of the repo in question and the last commit with some textual changes to files was from me. So there's wasn't any unknown commit in the history in theory.

What makes me additionally wonder are the following two things: My private .git-directory didn't increase too much, pretty much like expected. OTOH, the repo size on the server (maintained using GitLab) did increase by the ~180 MiB mentioned in the PUSH logs. Though, after running housekeeping in GitLab for that repo those 180 MiB got "removed" again and the storage consumption is somewhat back to the expected size of before committing the renames.

But the point is: For various reasons I have lots of repos of multiple hundreds MiB in size containing binary files and sometimes need to apply renames to them during refactoring. It simply seems to be a waste of time transmitting many binary contents in case of renames only, for which Git should know they didn't change and seems to know after the PUSH at some point again.

Any explanation? Thanks!

Generally, yes, Git should not re-send the large images. For this to work in general you need to have a deep-enough clone, and simple enough graph, that Git's short-cutting `git rev-list --objects-edge` knocks them out. This option may interact with bitmaps and/or commit graphs (I haven't dug into this stuff myself). — torek, Aug 11 '22 at 08:53

Why did Git push lots of data after ONLY renaming files?

0 Answers0