0

My question is maybe unusual, let me explain,

  • I made a first git versionning of a very big folder (20go) that I put on my host.
  • I reduced this folder size (not useful data), and I made a second version of it.
  • Problem is, git keep in memory the first version, so the .git/ folder stay very heavy.
  • I don't need the first version anymore.

I suppose git do incremental backup, so if I find a way to delete the first version, it will miss a lot of stuff in the second one, nearly everything...

The easier way for me would do to just erase my .git/ folder and create a new one, but still, it stays very heavy (from 20go to 13go) and take me hours to save.

So here is my question:

? Do you know if we can do "decremental" or "Synthetic full backup difference" backup with git, and if not, would you know a tool with which it is possible ?

Thanks

PS: I saw this solution on git:

  • git checkout --orphan latest_branch
  • git add -A
  • git commit -am "commit message"
  • git branch -D master
  • git branch -m master
  • git push -f origin master

It erase the log history in the git repo, but in fact not the data's history, so it's not a solution.

Anacarde
  • 109
  • 2
  • 7
  • Delete and only retrieve the latest version: https://stackoverflow.com/questions/1209999/using-git-to-get-just-the-latest-revision – Marged Jan 11 '20 at 15:17
  • Does this answer your question? [Using git to get just the latest revision](https://stackoverflow.com/questions/1209999/using-git-to-get-just-the-latest-revision) – Marged Jan 11 '20 at 15:17
  • Not exactly. New thing is that I have the git repo size corresponding of the last version (which i figured I couldn't have), but I also have all the data downloaded, ( that I already have on my computer, like I have the last version on it) – Anacarde Jan 11 '20 at 16:34
  • And if I do a git push -f then from my local repo containing only the last version of my data, still,, my remote repo keep all the historic data – Anacarde Jan 11 '20 at 16:49

2 Answers2

1

Git does not do « incremental backups ». It does « commits ». A commit generally consists of a hash (think of it as an address) that allows you to retrieve, among other things,

  • the previous commit address (hash)
  • the commit author
  • the tree at the time of the commit

The last one is the generally the most interesting, as the tree effectively describes an entire directory/file hierarchy by use of the same hash->content system! (In fact, git has been described as a VCS built on top of a content-addressable filesystem. Further, git saves some space (I believe) when files in trees do not differ, because their addresses are the same, and thus require only one object in git’s database.)

What all this means for you is

  1. Stop thinking of git as backup software; it’s version-control software, and understanding at least part of the underlying model will get you a long way. @torek has written many SO answers about this, and there are a few good talks.
  2. To erase history means creating a completely new commit, one that points to a different commit than before.

I can think of a couple of different ways to achieve (2). The first is git rebase -i @~2 (assuming you only have two commits), and then delete the first like in the rebase recipe. This tells git to re-write history, but skip the first commit. I expect this to fail, but I would be willing to be wrong.

The other alternative is git filter-branch, and there are 100s of internet pages that cover how to use it to delete certain files from the entire history.

Afterwards, a git gc should help save some space, but only do this once you’re sure everything worked out.

D. Ben Knoble
  • 4,273
  • 1
  • 20
  • 38
  • `git gc` saves space on disk but surely increases size of any incremental backup because for every backup new pack files will be copied. – phd Jan 11 '20 at 18:03
  • @phd huh? I’m not sure I follow your reasoning. – D. Ben Knoble Jan 11 '20 at 18:13
  • Every run of `git gc` removes old pack files and creates new huge pack files. Run an incremental backup and those new huge pack files will be copied. Without `git gc` an incremental backup will copy only new files in `.git/objects/`. – phd Jan 11 '20 at 18:16
  • @phd do you mean an incremental backup outside of git (à la TimeMachine)? i would argue that’s a non-issue (space is cheap, and distributing git projects you care about lessen the need to have good backups of them). – D. Ben Knoble Jan 11 '20 at 18:18
  • If space is a non-issue who needs an incremental backup at all? Just copy everything and be done! – phd Jan 11 '20 at 18:19
0

Thanks for you answers. I didn't find a way, so I simply choose to erase and recreate my repo .git/

Anacarde
  • 109
  • 2
  • 7