0

I tried pushing my local repo to the remote and noticed that it was taking an ungodly amount of time. So I searched around a little and came across git-sizer. Running git-sizer generates the following report

Processing blobs: 1508                        
Processing trees: 315                        
Processing commits: 22                        
Matching commits to trees: 22                        
Processing annotated tags: 0                        
Processing references: 1                        
| Name                         | Value     | Level of concern               |
| ---------------------------- | --------- | ------------------------------ |
| Biggest objects              |           |                                |
| * Trees                      |           |                                |
|   * Maximum entries      [1] |  4.71 k   | ****                           |
| * Blobs                      |           |                                |
|   * Maximum size         [2] |   440 MiB | !!!!!!!!!!!!!!!!!!!!!!!!!!!!!! |
|                              |           |                                |
| Biggest checkouts            |           |                                |
| * Maximum path length    [3] |   142 B   | *                              |
| * Total size of files    [4] |  8.55 GiB | *********                      |

[1]  c51165063bd15a74a3a9f5b03dd40c42f70e004e (7273dece03a5fd401b70c8bf04da67f5f6491d43:maxlife_10m_data.snappy.parquet)
[2]  8e1f3fa7aa5fd70ca4cabc8a3d0f4e20517f050c (1ba7cf0afc90c55b16cc15555ef17d54354c354b:tests/test_output_data/fep_tests/multi_clf_fe_output_train_data.csv/multi_clf_fe_output_train_data.csv)
[3]  17d038c0621352725bfc1e7d3bf38ed4480b69a1 (1ba7cf0afc90c55b16cc15555ef17d54354c354b^{tree})
[4]  a959c9e3fe72b7f0a14e1ed188c9130fabc7f526 (3cacec40355ddc12c0fd5d1ba9d1901da47e3843^{tree})

The Biggest checkouts section mentions a figure of about 8.5 GB which is definitely a lot bigger than my repo size of ~100 KB. How do I resolve this issue?

Clock Slave
  • 7,627
  • 15
  • 68
  • 109
  • You seem to have large Blob files in your repo. GIT can't track changes in blobs and will add a new one everytime they change. Basically don't put binary files into your repo. – Liam Aug 08 '18 at 10:27
  • I have seen encoding issues cause this also but I think you'd struggle to have a 4440Mb text file, so I'm guessing this is an image or a database backup or something – Liam Aug 08 '18 at 10:29
  • Oh you'll also want to remove this file from all previous checkins to make it go away. So you'll need to re-write commits that include this file. See [How can I completely remove a file from a git repository?](https://stackoverflow.com/questions/3458685/how-can-i-completely-remove-a-file-from-a-git-repository) – Liam Aug 08 '18 at 10:30
  • 1
    @Liam, No. I dont have an image or database backup. The largest file I have is a 12KB csv file. Also, I am not sure what binary files are? Does it include csv files? From my understanding, it doesn't. I used to have a bunch of large csv previously but now that I am pushing them to remote I did a `git rm --cached ` on them. – Clock Slave Aug 08 '18 at 10:33
  • @Liam, I tried the link you added above, but it results in `Cannot rewrite branches: You have unstaged changes.` – Clock Slave Aug 08 '18 at 10:35
  • 1
    git remove only removes the last file. The file will still exist in the history. Those CSV files sound like the culprit to me. Your error seems to suggest you have unstaged changes.. Rewriting a repo is quite an advanced task. The fact your confused by that error message suggests your not that confident in GIT usage so proceed with caution – Liam Aug 08 '18 at 10:42
  • @Liam. You are right here. I have very little idea how git works. Any idea as to what direction I should take? – Clock Slave Aug 08 '18 at 10:46
  • 1
    I think I've pretty much helped as much as I can here. It *appears* that these CSV files are you issue. A quick solution would be to start a new repo from scratch but that would obviously mean you loosing all your history. – Liam Aug 08 '18 at 10:48
  • Okay, I'll check if we can afford losing history on this one. Thanks, @Liam. – Clock Slave Aug 08 '18 at 10:53
  • 1
    another option is to spend some time trying to re-write the history. Providing you do this on a copy of your repo and don't push until your 100% sure you can limt any damage you may do. That's all covered in [the link](https://stackoverflow.com/questions/3458685/how-can-i-completely-remove-a-file-from-a-git-repository?noredirect=1&lq=1). Good luck – Liam Aug 08 '18 at 10:55

0 Answers0