3

Motivation: I spend a fair amount of time jumping through hoops to keep my git repository clean and small. This means avoiding binary files and images, preferring download/generate scripts over raw data, etc. It would be quite useful to me to have a tool that helps me figure out the actual impact that a commit will have before I apply it.

My question has two parts: Size and Integration.

  1. Size: How can I determine the impact of a git commit before I commit it? I have found a number of solutions to the "size" problem that don't really answer my question.

    1. Solution 1 - The top answer doesn't provide actual commands or a script, and is focused on network bandwidth. The second solution doesn't work.
    2. Solution 2 - The script linked in the accepted answer just looks at new files. What if I alter an existing file?
    3. Solution 3 - The script doesn't work. For example, my most recent commit is 0 bytes using this script. That can't be true because I modified a file, so there is some record in my git history with non-zero size stating exactly how the file was altered.

    What I would really like is a script that tells me the following: If my repository's size on the git server is S1 before I add, commit, and push, and S2 after I add, commit, and push, how can I figure out the value X = S2 - S1 prior to adding, committing, and pushing?

  2. Integration: How can I best integrate this size calculation into my standard git flow? Ideally I would like this commit size X to be displayed to me whenever I run git status. For example (see "Impact"):

    $ git status
    On branch master
    Your branch is up-to-date with 'origin/master'.
    Changes to be committed:
    Impact: 1234 bytes
    (use "git reset HEAD ..." to unstage)
    
    new file:   new_file_1.png
    new file:   new_file_2.sh
    new file:   new_file_3.cpp
    

I can see how this might not be possible without editing the binary - if that's the case, then just having a script to run manually would be fine.

Community
  • 1
  • 1
Jake
  • 7,565
  • 6
  • 55
  • 68
  • From what I know about how Git works I don't believe this question is solvable. – Andrew C Feb 04 '16 at 18:59
  • "the actual impact" depends on way too many things. Git's compression doesn't find duplicates just from adjacent history. – jthill Feb 05 '16 at 01:06
  • Take a look at `git repack --help`. The end result of the size is dependent on what options you pass to that, as well as what objects are in the repository itself. Predicting how the command will be run on the remote and what objects the remote has available for delta compression isn't knowable at commit time. If all you want is a rough estimate you could repack after you commit I suppose. – Andrew C Feb 08 '16 at 07:30

1 Answers1

1

Write a script. (batch, shell, python etc)

  1. git stash

  2. Compute the total size of your local repo (recursively check all files in all directories, using your OS's API to get file size) and assign it to a variable. Remember to skip the .git folder.

  3. git stash pop --index

  4. Subtract the new file size by earlier variable and print the difference.

  5. git status. If you use a programming language, you can get the output from git status (in Java, ProcessBuilder.getInputStream()) to a string, then format the string to exactly what you want with the data from step 4.

Then you just have to run that program/script before you push.

EDIT

To get the compressed size, looks like the simpler and surer way is to just check server size, git push, check server size again, then git reset --soft HEAD~. If you are using Github, you can get size with the API. eg. https://api.github.com/repos/git/git, look for "size".

Roy Wang
  • 11,112
  • 2
  • 21
  • 42
  • This is a clever idea. Thanks! It's a little suboptimal due to the fact that it would alter your staged files as well as the time complexity due to recursive reads. Still, I might go ahead with this if there's nothing more direct available as [Andrew C](http://stackoverflow.com/users/4021077/andrew-c) indicated. – Jake Feb 04 '16 at 23:46
  • `git stash pop --index` will re-stage previously staged files. Edited that and added in step 5 to accomplish exactly what you want. Recursive doesn't mean it's slow, it just search through deeply nested folders easily. You can use a loop too -- albeit slightly more complicated in terms of algorithm. You should ignore the .git folder too. It should take under 1ms to compute the total file size. – Roy Wang Feb 05 '16 at 00:43
  • What does any of this have to do with the compressed sizes that Git stores? – Andrew C Feb 05 '16 at 00:50
  • @Andrew C If you read the question he isn't actually talking about content in .git, he just want to know how many bytes a new commit will add to his remote. – Roy Wang Feb 05 '16 at 00:54
  • The only thing you push to the remote is in the .git directory. What you are talking about is the uncompressed size of the working directory, which he doesn't mention in the question. – Andrew C Feb 05 '16 at 00:57
  • That is not even remotely close to true. Please stop. – Andrew C Feb 05 '16 at 01:08
  • I was wrong about what i said earlier, but at least this solution gives the max possible size change to the remote, and the difference can be compared (relatively) across commits if there isn't much change in file format. – Roy Wang Feb 05 '16 at 01:16
  • @AndrewC Maybe this wasn't the intent, but my understanding was that I should check the size of the .git directory before and after stashing. Are you saying that won't work? Could you briefly explain why? – Jake Feb 07 '16 at 08:05
  • I took a look at .git, stashing won't clean the directory, it's related to reflog. You can only clear it by clearing the reflog, but one .pack file still won't get cleared, not sure if that will affect the actual difference. Looks like the simpler and surer way is to just check server size, `git push`, check server size again, then `git reset --soft HEAD~`. If you are using Github, you can get size with the API. eg. https://api.github.com/repos/git/git, look for `"size"`. – Roy Wang Feb 07 '16 at 09:27