Pull (rebase) deleted all files of the project

Question

Git deleted ALL files of my project that I've been working for the last month. Basically I tried pushing my project to Github but it didn't work, so I used "Pull (rebase)" and it deleted EVERYTHING (can you seriously believe this?). Is there anything I can do? Here is a link to the full log, thankfully I have it.

https://pastebin.com/4W8PAj3M

> git show --textconv :dist/index.js
> git ls-files --stage -- D:\CODE\dist\index.js
> git cat-file -s d6cf00e78b8901535facadfd2f1b02afefe5ec2f
> git show --textconv :src/index.ts
> git ls-files --stage -- D:\CODE\src\index.ts
> git cat-file -s f0ae144245ab480bd437b97be9d58d267399b602
> git push CHANapp main

score 1 · Accepted Answer · answered Nov 04 '21 at 04:48

Git didn't actually delete everything (not yet!). (The git pull command used to be able to do that, back in 2005 or so, and I did it to myself once or twice back then. I now mostly avoid git pull, for multiple reasons, not limited to this, but the lingering scars are one of the reasons. ) You are, however, in a bad corner case. To get out of it, start with git rebase --abort (but read on first).

You may have already followed shrey deshwal's answer, which may be helpful, but you'll still be in this corner case. The problem here is that you started a rebase, but never either finished it or terminated it. You must pick one of these two actions, and "finish" is probably not viable now. Note that uncommitted work may be discarded when you use git reset (from the other answer) or git rebase --abort (from mine). Fortunately I see a git commit (well, > git -c user.useConfigOnly=true commit --quiet --allow-empty-message --file -) command in your history, although there's no indication as to whether this succeeded. There's also the fact that the git rebase output you show has six commits it's trying to rebase (but there are some odd properties of that pastebin log).

If all goes well, git rebase --abort will set things up exactly as you were when you started. This gets you your files back (which is obviously good), but doesn't tell you what you need to know.

Long: What you need to know before you proceed

Git is, at its heart, all about commits. Users new to Git often think it's about files, but it's not. The commits hold files, but Git is about the commits. Or, they think it's about branches, but it's not about those either: the branch names help us find commits, but Git is all about the commits.

A Git repository is thus best viewed as a big database, or perhaps a pair or set of databases. These are generally simple key-value stores. The database of commits and other internal Git objects is indexed by hash IDs, which are big, ugly, random-looking numbers, expressed in hexadecimal, as in ^{_{e9e5ba39a78c8f5057262d49e261b42a8660d5b9}} for instance. Git needs this hash ID in order to extract a commit, or any of the other internal objects that Git uses to make commits work. Fortunately, Git doesn't make us memorize them; we'll see how this works in a moment.

Knowing that Git is all about commits, it behooves one to know what a commit is. We already know that it's numbered—that big ugly random-looking hash ID is a number, and each commit gets a unique one, specific to that one particular commit—but what's in a commit? There are two parts:

Each commit holds a full snapshot of every file. The files inside the commit are kept in a special, read-only, Git-only, compressed and de-duplicated form, that only Git can read, and literally nothing can write.

The de-duplication takes care of the fact that most commits mostly have the same files as a previous commit. The file contents are stored as internal objects, which are also given numbers (these numbers aren't exactly unique though: every time you store the same file content, it gets the same number: that's the de-duplication in action right there). The files' names are stored as yet other internal objects. This allows a repository to store files whose name your computer can't "pronounce", as it were, if your computer runs Windows or macOS.¹
Besides the snapshot, each commit stores some metdata: information about the commit itself. This includes the name and email address of the commit's author—your new commits get yours from your user.name and user.email setting. It includes some date-and-time stamps. It includes a log message, which git log and git show will show. Crucially for Git itself, each commit stores a list of previous commit hash IDs.

Most commits store exactly one such hash ID, which we call the parent of the commit. The commit is then the child of that parent. The child holds the parent's ID, so the child know who its parent is. The parent, however, never holds any of its children's IDs. This is because, in order to make the hash IDs work, Git has to freeze the commit for all time as soon as it's made. (This is true of all of Git's internal objects.) This means no part of any commit can ever be changed, not even by Git itself. We make the child commit today, and if, tomorrow, it becomes a parent of a new child, well, too late now: the child's ID can't be put into the parent.

So these links, where a child points back to its parent, go only one way, backwards. (This is kind of a general theme in Git: that it works backwards.) If we draw a series of single-parent commits, with newer commits towards the right, using uppercase letters to stand in for the hash IDs, we get something like this:

... <-F <-G <-H

Here H stands for the hash ID of the latest commit. It contains the hash ID of earlier commit G: that's the arrow coming out of H, aimed at G. G then points to F, which points back still further, and so on.

Since each commit holds a full snapshot (plus the de-duplication), Git can pretty easily pull the snapshot out of H's parent G and out of H itself and compare them. Where files are the same, Git can say nothing. Where files in the two commits differ, Git can then compare the contents of the files and produce a recipe: *Do this to the G version of file F, and you get the H version of F. Then Git can step back one hop, and do the same with F-vs-G. Having shown commit G this way, Git can step back yet again and show F, by comparing against its parent (presumably E), and so on.

This backwards-one-hop-at-a-time thing is how git log works: without -p, it shows each commit's author and log message and then moves backwards. With -p, it shows the same thing, adds the diff produced by comparing the parent's snapshot with this commit's snapshot, and then moves backwards. (This is all just for ordinary, single-parent, commits; when we introduce two-parent merge commits, things get complicated for git log.)

¹Technically, this is file-system-dependent: a Linux box with an NTFS file system has the same problems as a Windows box, and a macOS system on which you've made a case-sensitive volume can deal with two files, one named README and one named ReadMe, at the same time, just like Linux. But the default setups on Windows and macOS fall short when compared to Linux.

Branch names help us (and Git) find commits

With our simple linear setup with just one branch, we need to answer one question to get Git to work: What's the latest commit? That is, in our drawing, we had H standing in for the actual latest-commit hash ID. But H is really some ugly random-looking thing, ef9b31c... or whatever. We could write this on a whiteboard (and then typo it all day), or save it in a file for cut-and-paste. But why not let Git save it, in a database or something?

This is where the second main database in each Git repository comes in. Each Git repository has its own database of name-to-hash-ID mappings. A branch name, in this database, remembers which commit we want to say is the latest commit that is "on" that branch:

...--F--G--H   <-- main

Here, the name main holds the actual hash ID of commit H. So main points to H, just like H points back to G and so on. I get lazy about drawing the arrows from commit-to-commit because of text font limitations on StackOverflow (and because of laziness ), but they're still there. Since they are part of the commits, they can't change, and always point backwards.

Note that if we have more than one branch name, it's possible that both branches will point to the same commit, like this:

...--F--G--H   <-- develop, main

This means all these commits are on both branches. That's another thing that's weird about Git, compared to most other version control systems: commits, in Git, are on as many branches as you like. The branch names just find the last commit.

Since H is the last one on each branch, it doesn't matter which name we pick, but we do need to pick one—or, in your particular case with the rebase, none, but we'll come back to that later. To pick a branch, we use git checkout or git switch. In a new, completely empty repository, we have a problem, because a branch name has to point to some commit, and with no commits yet, we can't have any branch names yet. So GitHub will make a new repository with one initial commit in it, so that it can have a branch name:

A   <-- main

When we clone this initial GitHub repository, we get a copy of the one commit A, and no branches at all. Then our Git uses information from their (GitHub's) Git programs to figure out which branch name to create in our repository; with only one name, our Git picks our main, and checks that out:

A   <-- main (HEAD)

The HEAD-in-parentheses thing is how Git knows which branch name to use. If we now make some more names:

A   <-- dev, main (HEAD), next

our one commit is now on three branches. We can now pick one of these, with git checkout or git switch, and switch to that branch:

A   <-- dev (HEAD), main, next

If we now make a new commit ... well, you already know how to do this, so we'll just make the new commit now. The new commit gets a new, unique, random-looking hash ID, which we'll call B. New commit B points back to the commit we were using, commit A:

  B   <-- dev (HEAD)
 /
A   <-- main, next

The magic trick Git pulls here is to write B's new hash ID into the current branch name. (Technically Git stores the hash ID into refs/heads/B in the names database.)

Note that our branch names are not on GitHub. Neither is our new commit B. Commit A is in both repositories—we got our A from them—and hence has the same hash ID in both repositories, because it is the same commit. Commit B is only in our repository now.

If we now git switch main, Git will remove, from our work-tree, all the files we committed into commit B. It will then extract, from commit A, all the files that go with commit A, and we will be in this state:

  B   <-- dev
 /
A   <-- main (HEAD), next

Any files we left untracked are neither in B nor in A; they just sit around in our working tree as untracked files. Git didn't remove them when we moved away from B, because they are not in B. Git didn't extract them from A, because they are not in A either. So they are still just sitting around in the working tree.

If we switch from main to next, we're now switching from commit A to ... well, commit A again. There's no need to remove and replace any files, so Git doesn't bother: all that happens is that Git moves the name HEAD over to attach it to next:

  B   <-- dev
 /
A   <-- main, next (HEAD)

If we make a new commit here now, the new commit points back to A, and next points to the new commit:

  B   <-- dev
 /
A   <-- main
 \
  C   <-- next (HEAD)

If we switch back to main and make new commits, they point to A and then to the new commit:

  B   <-- dev
 /
A--D--E   <-- main (HEAD)
 \
  C   <-- next

and so on. Eventually we have a whole mess of commits, which so far are still all linear: each child has just one parent. (Parent A has three children though.)

The thing about all of this is that the commits are what matter. The branch names just let us find the most recent. Git says that some commit is "on" a branch if, by starting with that name's "last" commit, we can work our way back to the commit in question. So commit A is on all three branches.

If we delete the name next, commit C is still there:

  B   <-- dev
 /
A--D--E   <-- main (HEAD)
 \
  C   ???

Now, though, the only way to get commit C out of Git is to know its hash ID. Eventually Git will realize that C is going unused because it can't be found, and will toss it entirely, but we always have some grace period before this happens.

Note how untracked files—those that aren't in the commit—take no part in any of this. They just sit around occupying space in our working tree. But there can be a problem: if we have an untracked file path/to/F in our working tree, and we ask to switch to some commit that has a file named path/to/F in it, what happens? We'll hold off on this for now, although it plays a part in what you'll need to do.

Merging

You aren't merging, but it's worth a quick look at this before we move to rebasing, because rebasing uses git cherry-pick internally, and git cherry-pick uses a form of merging, internally. So we need to know how merging works.

Suppose we have achieved this graph:

          I--J   <-- br1 (HEAD)
         /
...--G--H
         \
          K--L   <-- br2

That is, right now we are using commit J, through the name br1. The parent of J is I and the parent of I is H. Meanwhile the name br2 locates commit L, whose parent is K, whose parent is H again. Commits H and earlier are on both branches, while I-J are only on br1 and K-L are only on br2.

If we now run:

git merge br2

Git will locate commit L. Merging is really about commits, not branches: once again we're just using the name to find the latest commit.

Git will now use the parent links from our current commit J and the other commit L. Working backwards from both branch tip commits, Git will find the best common ancestor: a commit that is on both branches, and is "better than" other commits that are also on both branches. "Better" in this case means roughly "closer to the branch tips", so H is better than G. The commit that Git finds this way is the merge base.

The merge is now ready to proceed, by comparing the merge base commit H snapshot against our current commit snapshot J to see what we changed, then comparing H against their snapshot L to see what they changed. Just as with the parent-to-child diff, this kind of diff shows:

which files are the same, and which are different;
for each file that is different, what to do to make it "the same".

By combining these two sets of diffs, git merge can come up with changes that will, applied to the snapshot in H, keep our changes (from H-vs-J) but also add their changes (from H-vs-L). There is a lot that can go wrong here, but if all goes well, Git does apply the combined changes to the merge base snapshot, and then make a new commit M:

          I--J
         /    \
...--G--H      M   <-- br1 (HEAD)
         \    /
          K--L   <-- br2

Commit M is special in exactly one way: it has two parents, instead of just one. It still has a snapshot, just like any commit: the snapshot is the one Git made when it applied the combined changes to the merge base snapshot. It still has an author (us) and a log message (the default one is the rather crappy merge branch br2 into br1). But instead of just the usual one parent J, M gets a second parent L too.

Note that when git log hits commit M, it has a problem with showing what changed in M. The usual method is to run a diff against the (single) parent, but M has too many parents. The default answer git log here uses is to silently give up: it just doesn't show a diff at all. Git has another problem too: should it follow the first parent back to J, or the second to L, or what? The default answer git log uses here is to follow both, in a somewhat random-seeming order.

(You actually have a lot of control here, but since this is not about merge and log, we'll just move on now.)

Dealing with multiple Git repositories

You mentioned GitHub. GitHub is a site that hosts Git repositories, among other things. This means that they—GitHub—have Git repositories, which have commits in them, and branch names and other names for finding commits.

I mentioned earlier that:

git clone <url>

copies a repository by copying all of their commits and none of their branches. This is true, but it obviously creates a problem. Suppose they—GitHub—have three commits in their repository, two found by the name main and one by the name develop:

A--B   <-- main
    \
     C   <-- develop

When we clone this, we get the commits, but no branches:

A--B   ???
    \
     C   ???

We need a way to know that their main means commit B, and their develop means commit C. To achieve that, our Git—our software, working with our repository—gets branch names from their Git and changes those into remote-tracking names:

A--B   <-- origin/main
    \
     C   <-- origin/develop

These remote-tracking names remember their branch names, as of the last time we ran git fetch (git clone is basically shorthand for running a bunch of commands, with git fetch being command #4 or #5 or so). Then, having copied their commits and tweaked their branch names, our Git does a specialized git checkout that creates a branch name. We use -b at git clone time to say which of their names we want copied. If we don't use -b—most people don't, most of the time—our Git asks their Git what they recommend; in this case, that's probably main. In all cases, our Git now creates our single branch from the origin/ version of that branch in our repository, which is from the branch of the same name in their Git. Our Git then checks out this commit to fill in our working tree (and staging area):

A--B   <-- main (HEAD), origin/main
    \
     C   <-- origin/develop

We now have the ability to add new commits:

     D--E--F   <-- main (HEAD)
    /
A--B   <-- origin/main
    \
     C   <-- origin/develop

If they add new commits, say to their main and their develop, and we run git fetch now, we will get:

     D--E--F   <-- main (HEAD)
    /
A--B--G--H   <-- origin/main
    \
     C--I   <-- origin/develop

Note how this kind of thing looks exactly the same as when we make our own branches. These remote-tracking names are different from branch names in two important ways:

we won't update them: they're reserved to remembering the other Git's branches;
in service of the first point, we can't get "on" them.

In a moment we'll have more about that last part, because it really matters a lot here. For now, just note that to get "on" some branch, we need to have Git attach HEAD to the branch name. Git will do that with our branch names, but not with our remote-tracking names.

Now that we have new commits on our main, we might like to send these new commits to their Git repository. But:

git push origin main

won't work, and the reason is simple enough. We have commit F as our last commit on our main. They now have commit H as their last commit. The git push command figures out which commits we have, that they don't, that they'll need for our git push: that's D-E-F. It then sends over those commits, so that they have, in their repository, this:

     D--E--F   ???
    /
A--B--G--H   <-- main
    \
     C--I   <-- develop

Note how there is no name at all for F yet, and the name main—not origin/main, not fred/main, just plain main—finds commit H.

Our git push, having successfully sent commits D-E-F, now asks their Git to please, if it's OK, set their name main to point to commit F. If they did, they would have this:

     D--E--F   <-- main
    /
A--B--G--H   ???
    \
     C--I   <-- develop

They have just lost their G-H commits. (The commits still exist, at least for some time, but can't be found: Git finds commits by starting with the names, then working backwards, and there is no name that finds H any more.) So they will say no, I won't do that (non-fast-forward, in Git jargon).

What this means is that we must now somehow combine our work, in D-E-F, with their work, in G-H. One way to do that would be to use git merge: combining work is, after all, what it's for. This is one of your options, and merge is simpler than rebase, because it's just one operation. But you may have a "rebases only" workflow, or a "rebases encouraged" one. If so, you will need to use git rebase.

Rebase is about copying commits

Your exact repository graph topology will vary. Consider using git log --graph or similar to view it, if you don't have a graphical viewer (see Pretty Git branch graphs). Let's draw a simplified view now though:

       D--E--F--G--H--I   <-- main (HEAD)
      /
...--C--J--K   <-- origin/main

Given that your rebase output mentioned 1/6, 2/6, and so on, I've drawn in six commits that need to be copied to new-and-supposedly-improved commits.

The way git rebase works is by:

First, listing out the commits to be copied: in this case that's D-E-F-G-H-I. These commit hash IDs go into a file somewhere.
Next, Git does a detached HEAD checkout of the target commit. In this case that's commit K.
Git now enters a loop. One commit at a time, Git tries to git cherry-pick each commit that is to be copied.² Each of these cherry-picks is technically a mini-merge (although the result is an ordinary commit, not a merge commit). Since you have six commits to copy, that gives six chances to have a merge error.

The effect of each cherry-pick is to take the diff from the commit's parent, to the commit, and apply that diff to the current working-tree. That is, Git "copies" the commit by turning it into changes. Git then has to apply those changes to whatever commit is checked out now. This means Git has to figure out where the lines went, in case what's checked out now moved the places the changes need to go—and that in turn means Git needs to diff the parent against the current commit. That "make two diffs and combine them" is just what we saw in git merge, and that's how cherry-picking is merging.
If all goes well, the result at this point looks like this:
```
        D--E--F--G--H--I   <-- main
       /
 ...--C--J--K   <-- origin/main
             \
              D'-E'-F'-G'-H'-I'  <-- HEAD
```
Note how the name HEAD points directly to commit I', which is the one copied from I. This is detached HEAD mode, which we'll come back to in a moment.

Because all did go well, Git now forces the name—main, in this case—that you were using, back when you started all of this, to point "here", wherever HEAD points now. Git then "re-attaches your HEAD":
```
        D--E--F--G--H--I   ???
       /
 ...--C--J--K   <-- origin/main
             \
              D'-E'-F'-G'-H'-I'  <-- main (HEAD)
```

Since nobody ever looks at raw hash IDs, it seems that Git has somehow overwritten the original commits with these new-and-improved ones. It hasn't, though: D-E-F-G-H-I still exist in your repository. You just can't find them, which means git log doesn't show them either.

Since the new commits now add on to origin/main, if you git push origin main now, you'll have your Git send D'-E'-...-I' to the other Git, and this time, they will add them on.

²Some forms of git rebase don't use git cherry-pick directly. In the most current version of Git, you must specifically ask for these older rebase forms. In old versions of Git, you must ask for the cherry-pick variant with -m or -i or several other flags. In most cases, these all work out the same, so it usually doesn't matter which one you use.

Your rebase is failing, leaving you in the detached HEAD mode

When Git stops in the middle of a rebase, we might have this:

       D--E--F--G--H--I   <-- main
      /
...--C--J--K   <-- origin/main
            \
             D'-E'-F'  <-- HEAD

If we look at what we have in our working tree, it's from commit F', perhaps with an attempt to merge the F-to-G changes in to make G'. If we run git log, we see commits F', then E', then D', then K, etc. Our original G-H-I commits seem to be gone. Our original D-E-F commits show up as our copies, but they're not there.

Your git rebase output—starting at line 229 in your pastebin text—says:

Rebasing (1/6)
Rebasing (2/6)
Rebasing (3/6)
error: The following untracked working tree files would be overwritten by merge:
  node_modules/.bin/acorn
  node_modules/.bin/acorn.cmd

[mass snip]

That is, the cherry-pick or equivalent operation that's trying to copy this third commit (3/6) has run into a problem:

there exist some untracked working tree files;
there are committed copies of these files;
so Git wants to overwrite the working tree files with the committed files.

In this particular case, these files are in node_modules. This particular directory's files should almost never be in any commit, so the commits that are about to be copied are "bad" in a sense (unless you've decided that, for whatever reasons are appropriate to your situation, you should commit them). If they had never been committed, you would not be having this issue.

In "detached HEAD" state, the name HEAD is not attached to any branch name. Using git reset won't fix this. One way to fix it is to run git checkout or git switch with a branch name, but for the rebase-failure case, this is the wrong answer.

What to do about this

To get out of the "detached HEAD" state, run git rebase --abort. When you started the rebase, you had:

       D--E--F--G--H--I   <-- main (HEAD)
      /
...--C--J--K   <-- origin/main

All these commits still exist. Running git rebase --abort says throw out what I have now (using git reset internally, in fact) and check out the branch I was on when I started this rebase. So you'll be back in that mode, with your index and working tree matching commit I again. Any untracked files that Git could leave undisturbed, Git will leave undisturbed, but any work you did after the rebase stopped, and have not committed, will be lost. Any work you did and then committed will be saved in a commit, but the commit's hash ID will become difficult to find (you'll need the git reflog command after aborting the rebase).

Now you'll need to decide: should these files be committed? Whatever you decide here ("yes" or "no") means that some existing commits are likely to become "bad" commits.

Next, you'll need to correct the bad commits:

the files should be committed, so the mistake is that they weren't for a while; or
the files should never be committed, so the mistake is that they eventually were.

Because the current working tree copies of these various node_modules files are not committed, you should also decide whether you need to save the current working tree copies (however many may be left after the rebase misadventures). If you do decide to save them, you can:

commit them, or
move them outside the Git repository.

The first method of course puts them into a new commit. The second method doesn't. So if they should never be committed, use the second method.

If they don't need to be saved—this is the usual case; these files are autogenerated, which is why we don't put them in commits in the first place—then you can either move them, or remove them, at your own discretion. Use whichever you prefer (moving them away lets you move them back, which will be fast but may get you the wrong files if the autogenerated ones would differ; removing them is easier, but may take a long time to autogenerate them again later).

Once you've done that, you'll want to make new-and-improved commits that correct the problems with the old (and lousy?) commits. You can do that with git cherry-pick, especially git cherry-pick -n followed by hand work followed by git commit. Or, you can do that with git rebase, though this time you'll want a sort of "in place" rebase, almost certainly with -i and edit.

(That's plenty for this answer, since any next steps depend on decisions you have to make.)

git rebase --abort saved me, I kneel. Didn't read the rest of your post though. — King Of Chads, Nov 04 '21 at 05:15

score 0 · Answer 2 · answered Nov 03 '21 at 18:03

first use git reflog

Reference logs, or "reflogs", record when the tips of branches and other references were updated in the local repository. Reflogs are useful in various Git commands, to specify the old value of a reference. For example, HEAD@{2} means "where HEAD used to be two moves ago", master@{one.week.ago} means "where master used to point to one week ago in this local repository", and so on.

then use git reset --hard master@{2}

which Reset back to the value before two changes