0

I tried to push files that were too big for the GitHub limit and couldn't push them. I removed them from the folder, but they're still there, waiting to be committed, and it's blocking other push attempts too.

  • 1
    Does this answer your question? [Removing multiple files from a Git repo that have already been deleted from disk](https://stackoverflow.com/questions/492558/removing-multiple-files-from-a-git-repo-that-have-already-been-deleted-from-disk) – Lewis Feb 15 '20 at 01:58
  • 1
    This is a better link, but it has a LOT of answers, only some of which apply: https://stackoverflow.com/q/927358/1256452 – torek Feb 15 '20 at 02:06
  • The first link's solution didn't work, I'll try again with the new one. If it changes anything, I had committed something after the large files. – CScience416 Feb 15 '20 at 02:30
  • In a way, yes, but when I tried it then it didn't work, although the commands are similar. – CScience416 Feb 16 '20 at 21:06

2 Answers2

1

Your problem is due to the fact that Git doesn't push files. Git pushes commits.

Git doesn't really store files either: Git stores commits. Commits themselves store files, but the key point is, you either have all of a commit, or none of it. There's no way to pick and choose just a few files out of it.

When you added the big files and committed, you made commits that have those files. Then you made additional, later commits that don't have those files.

So, keeping in mind that each commit stores a full and complete snapshot of all of your files—well, all of the files that are in that commit, which might be more or fewer files than some earlier commit—let's look at the rest of a commit. Besides the snapshot, a commit stores some metadata: some information about the commit itself. For instance, the commits you made have your name and email address in them, from your user.name and user.email settings. When you ran git commit, Git made you enter a log message, to say why you made the commit; the log message for each commit is part of that commit's metadata.

Commits are numbered, but not sequentially (that would be too easy!). They have big ugly hash IDs as their numbers. Every commit gets a unique hash ID, reserved for that commit forever more, and in a way, reserved for that commit long before you made it, except that the actual hash ID depends on all the metadata, and one of the items in the metadata is the date-and-time-stamp of when you made it. So until you actually chose a particular second of a particular minute of a particular hour of a particular day (etc), we wouldn't know what the hash ID would be. Still, that hash ID, from now on, means that commit. Your Git and any other Git can exchange commits, and know if they have each other's commits, by just comparing hash IDs.

The last key piece of metadata, though, is that every commit1 stores its previous commit's hash ID as its parent. So every commit remembers which commit comes before it.

This means commits form a sort of backwards-pointing chain:

... <-F <-G <-H

where H stands for some big ugly hash ID. Commit H holds earlier commit G's big ugly hash ID as its parent; we say that H points to G. Commit G holds F's hash ID, so G points to F, and so on.

Note: the hash IDs are exquisitely sensitive to every bit of data in the commit. This means no part of any commit can ever change. (That's why it's OK to trust in the hash ID.)

When you run git push, your Git calls up another Git. Your Git says: Hey, I'd like to give you my latest commit H. Do you have it yet? They say: No, send it over. Your Git says: "To have H you need G, do you have it yet?* They say no, send that one too, and so on. This repeats until we reach a point where they say: Oh, yes, I have that one.

That's how your Git knows what it needs to send, and sends it all: just the right set of commits.

But you've got commit H, which doesn't have the big files, and commit G, which does, and then maybe you and they both have commit F. So your Git insists on sending both commits to them. That's because you can't have a commit, in Git, unless you have all of its parents.2


1Some commits store two or more previous-commit hash IDs. These are merge commits. At least one commit in any non-empty repository stores no previous hash ID: the very first commit someone makes can't remember a parent, as there is no previous commit. This kind of commit is called a root commit.

2There are some ways to avoid having all of the parents, but they don't apply to this case.


What you need to do

What you need to do, then, is construct some new commit(s).

Commit H is sort of OK on its own, except it isn't on its own: it is indelibly linked to commit G, and commit G isn't OK because it has some really big files in it, that you don't want to have. Commit F is OK and they they already have F ... or maybe F is bad too, and it's E that's OK and they already have. Whatever your situation is, you have to figure out which commits are good and which aren't.

So, what we need to do is find out which commit they have, and which commits we have that are good and which are bad, and re-copy our good ones while leaving out our bad ones. There are a lot of ways to do this, but often the best one is to use git rebase -i:

git rebase -i origin/master

Our name origin/master is our Git's way of remembering their Git's master. This assumes your current branch name is master, i.e., that the picture we should draw really looks like this:

...--F   <-- origin/master
      \
       G--H   <-- master (HEAD)

Here, commit G is the one with the big file(s), and commit H is the one that takes them away again. We'd like to copy G-plus-H such that the big files are gone. Since H simply removes the file, we can use git rebase -i's squash command to combine G and H into one new commit, I, that does what G+H did, all in one:

pick <hash> subject line for commit G
pick <hash> subject line for commit H

becomes:

pick <hash> subject line for commit G
squash <hash> subject line for commit H

We write this out and git rebase goes and squashes the two and invokes our editor to let us write the new commit message.

When the whole thing is done, we end up with this:

       I   <-- master (HEAD)
      /
...--F   <-- origin/master
      \
       G--H   [abandoned]

That is, we kept existing commit F without touching it at all. Then we had Git extract G, pile H on top of it (removing the big files), and commit the result as new commit I. Our Git then took our name master off commit H, which was OK except that it linked to G. It now has our name master set to point to commit I.

We can now run git push origin master successfully: our Git will call up their Git, and say I'd like to offer you commit I if you don't have it. They'll say OK, what about I's parents? Our Git will say I's parent is F. They'll say Oh I have that one! Our Git will send commit I and then ask them to set their master to point to commit I.

If all goes well this time, they'll now have commit I. They never see our G and H again, and neither do we: we no longer have a name by which to find the big ugly hash ID of commit H.

Eventually—after 30 or more days, typically, though you can tweak this—our Git notices that no one seems to want H at all, and that G can only be found by starting at H and working backwards. So our Git will actually throw out commits H and G at this point (when the maintenance git gc command gets around to it, really), and our repository will shrink because we don't have the commit with the big file.

(If your commits are more complicated, you may need a somewhat fancier git rebase, or git rebase -i might no longer be the best tool for all of this. A great deal depends on exactly what's wrong and whether there's more than one way to fix it.)

torek
  • 448,244
  • 59
  • 642
  • 775
  • I have three commits. One is the big file, the second is the removal of those files, and the third is the addition of new files. When I put in the rebash command I saw: ```pick [commit1] [commit] pick [commit2] [commit] pick [commit3] [commit]``` I turned the 2nd pick line into squash. The bash is not prompting me to do anything else. How can I confirm the new choice? – CScience416 Feb 15 '20 at 03:09
  • That's the right sequence (pick + squash + pick). If you write out the file and let rebase continue, it should open the editor on the combined commit's message. Fix that up and write it out and exit the editor and rebase will then pick the third commit atop the combined commit. Note, however, if the combination of the add and then delete winds up being "do nothing", Git won't know how to continue: you'll get a message to the effect that the combined changes don't do anything and what should rebase do now? In this case the easiest thing is to start over and just drop two of the 3 commits. – torek Feb 15 '20 at 07:00
  • That's what I ended up doing, since nothing happened after I changed the pick to squash. Thank you! – CScience416 Feb 16 '20 at 21:04
0

Have you tried git rm {file1} {file2} ... {fileN}? You have to tell git to remove the files from the source tree even if you delete the files themselves.

Lewis
  • 4,285
  • 1
  • 23
  • 36
  • Do I have to put in the entire file path itself or can I just do something like ```git rm a.xyz```? – CScience416 Feb 15 '20 at 01:52
  • If you're in that directory, `cd my_directory`, you can just say `git rm my_file`. – Lewis Feb 15 '20 at 01:54
  • It says that there is no such file or directory, yet when I try to push again I still get the errors as if they were still there. – CScience416 Feb 15 '20 at 01:57
  • See [this answer](https://stackoverflow.com/questions/492558/removing-multiple-files-from-a-git-repo-that-have-already-been-deleted-from-disk). – Lewis Feb 15 '20 at 01:58