0

Does GIT automatically remove local files after they are removed from the repository? In my case, described below, it looks like the answer is NO. So what's the trick? Does it depend on HOW the files are removed?

My case; I had a coworker clone our master repo, then he created a branch, then he removed a directory containing hundreds of files. Then he submitted a pull request, I merged to master, then I checked the web-view of the master repo and that directory is GONE. PERFECT.

BUT next, I tried to "sync" my branch, I tried to "update from Master" and STILL, that directory is still sitting there in my LOCAL file system. GitHub desktop says my branch is "in sync" with master. OK... maybe it is. But why doesn't Git remove the files from my local folder? They are NO LONGER in the repository!

Jonathan Hall
  • 75,165
  • 16
  • 143
  • 189
  • Oh... in the repo? They _are_ still there... as part of a revision (or many, most likely). If you checkout one of the revisions where the files were present, you will get them as part of the checkout process, so they are in the repo. Now, they are not _at the tip_ of a branch? Then we are talking. If you checkout the _remote_ branch master, the files are there? – eftshift0 Apr 05 '17 at 19:37
  • I removed from MASTER repo. The are gone. If I sync my branch or updated my branch from master, the files that were removed from the MASTER repo are STILL sitting in my local folder. Look like I have to remove them manually. – supercommando440 Apr 05 '17 at 19:42
  • What is the output of your `git status`? – CodeWizard Apr 05 '17 at 19:45
  • I think there's a semantic problem here. "syncing" could mean a ton of stuff (just fetching with no merging, fetching/merging, pulling while working on another branch, pulling working on local branch master, etc). What I want you to try is something straight forward, no ambiguities: fetch (if you haven't done so) and checkout the _remote_ branch master (because the removal of the files has already been merged there) and let us know if the files would be there sitting on your FS (they shouldn't). – eftshift0 Apr 05 '17 at 19:46
  • Here's another problem; I'm looking at the MASTER and a file I just had committed to master that contains a change I made. THen In GitHub Desktop I created a new BRANCH from MASTER. Then I "publish" it to remote (that's how the GitHub desktop option reads). After that I see my new branch in the web-view. The url is for the MASTER repo. The branch count incremented. I clicked to see all branches; I see my new one. Then I view the branch. The change I made to the file is NOT THERE. WOW! This Git stuff is NOT working for me. – supercommando440 Apr 05 '17 at 19:47

2 Answers2

1

Git does not remove any files. You can remove specific files from a given commit but they will remain in the repo unless you totally remove them using git filter-branch or similar tools like BFG

To verify that you have the latest version use this command:

git log ^master origin/master

It will show up commits if you are not fully synced with the remote

BUT next, I tried to "sync" my branch, I tried to "update from Master"

Assuming you did a git pull origin master there should not be any changes between the 2 branches so the above command should return nothing.

CodeWizard
  • 128,036
  • 21
  • 144
  • 167
  • Note, I'm not using ANY command-line commands. I'm doing all this using GitHub Desktop. So to clarify, are you saying Git does NOT remove files from the local file system after they are removed from the repository that is pointing at the local file system? If that true, good to know. – supercommando440 Apr 05 '17 at 19:59
  • Ok... it DOES remove files. I think I was experiencing a delay due to the amount of files deleted, and also working off a slow wifi connection. After about 30 min, after "update from Master" and then "sync", I noticed in the WEB-view of the git repository (master and branch), AND in the GitHub Desktop view of master and branch, AND on my local file system, the removed files in question are GONE. Yes, even on MY local file system (after being remove by another developer). – supercommando440 Apr 05 '17 at 20:30
0

To understand how this all really works, and when and why Git removes files, you must hold three ideas in your head simultaneously:

  • Git stores, in a repository, commits.

    Each commit is a snapshot of a work-tree—sort of; see below. It's complete in and of itself. Once you make a commit, it never changes. It's designed (mathematically) to be impossible to change: the identity—the "true name"—of any commit is a cryptographic checksum of the contents of that commit. This means that if you change even a single bit of any file within a commit, or add any new file or remove any old file, what you get is a new, different commit, with a different name.

  • A Git repository has a work-tree.

    The format of files inside commits is something only Git itself can use. If Git never let you edit, view, and otherwise use your files, it would be useless. So each repository has a work-tree, which is basically a place where Git has expanded those files into their normal form, so that all the rest of the programs on the computer—and you yourself—can use them.

  • A Git repository has an index.

    Work-trees and commits are quite different, but you can convert a work-tree into a commit, or a commit into a work-tree. Git's index is the intermediate place "between" work-tree and commit. Git is unusual in exposing this thing: other version control systems sometimes have something that is like Git's index, but most keep it hidden. Git does not.

    In any case, the index is the key to all of this.

If you are writing new commits, the best way to describe, and think about, the index is that it is where you build the next commit. There are a bunch of reasons for this, some better than others, and many having to do with speed (of extracting old or making new commits). There are, however, several key features that Git gives you via the index, which forces you to know what it is and how to use it.

In particular, in a lot of computing systems, we want or need to keep, in the work-tree, files that will never be committed. For instance, with compiled languages, we have the source code, and then the compiler output files. Projects may have site-specific configurations. There are a lot of good reasons to want to keep un-versioned files mixed in with the versioned files (in some cases, they may even be versioned, but separately from the source—Git is not very helpful here though).

Hence, the index is kind of a go-between, sitting between the permanent commits and the temporary but useful-to-things-other-than-Git work-tree. Besides just letting you stage files for the next commit, though, the index keeps track of which files you have extracted from the current commit. (More precisely, it keeps track of which version of each file you have, which is one of the ways Git manages to be as fast as it is.)

Here's the answer about what gets removed

When you move from one commit to another—as you do with git checkout of some commit other than the current one, and the so-called fast-forward variety of git merge—Git will remove from your work-tree any file that:

  • is in the index
  • but is the wrong version of the index for the new commit

It will then add to your work-tree any file that:

  • is in the new commit
  • but is not already at that version in your index

In other words, the index not only lets you build the next commit, it also remembers what you have in your work-tree. If you move from (previously current) commit badc0ffee to (newly current) commit faceacafe, and your index says that you have version deadc0de3 of file zorg.py that went with badc0ffee but new commit faceacafe has no zorg.py, Git will remove zorg.py.

For all this to work, your index must match your current-before-you-change-it commit.

Fancy GUI front ends may hide, or try to hide, the index from you. This is usually a mistake since it's so central to proper Git operation.

Some extra side notes

The above glosses over the protections that Git gives you about checking out commits while you have modified files (called a "dirty work tree" or "dirty index"). Assuming you don't do this—you never modify some work-tree files and then, deliberately or accidentally, fail to stage (git add) and/or commit them—your index will always match your current commit. To change commits, Git changes the files in the index; and in the process, it changes those files—and only those files—in the work-tree.

If you do deliberately set up a dirty index and/or work-tree, Git will try to let you change commits anyway. This succeeds only if the new commit has stored in it the same versions of the same files as the ones you have "dirtied". This works precisely because Git only updates those files that are, as it were, "wrong in the index" for the new commit. For (much) more on this, see Git - checkout another branch when there are uncommitted changes on the current branch.

Community
  • 1
  • 1
torek
  • 448,244
  • 59
  • 642
  • 775
  • Too much info... repository tools shouldn't be this complicated. I'm doing some real simple stuff... change a file, submit pull req. Someone else approves and merges. Meanwhile, back on my local file system,the changes to file I changed somehow got reverted. WOW. – supercommando440 Apr 06 '17 at 22:34
  • Sorry, but version control is *not* that simple. You can do simple things with it, but when it goes wrong—and clearly, it is going wrong—you *must* understand at least *some* of the underlying elements to "root cause" the problem and fix it, because version control itself is not that simple, and distributed systems are even less simple, and you are mixing the two. It's probably not something *you* are doing that is causing the problem, but the question is, do you want it fixed? – torek Apr 06 '17 at 23:01
  • I meant "should be so complicated to use". Sure, there's probably a LOT going on under the hood, in Git, and even in SVN. But SVN is SIMPLE to use compared to Git. Unfortunately I am being forced to use Git. I would never recommend it. – supercommando440 Apr 06 '17 at 23:12
  • Yes, SVN has a central server, which throws a lot of complexity out. That has its own drawbacks, of course. – torek Apr 06 '17 at 23:30