Removing pycache in git

Question

How can I remove existing and future pycahce files from git repository in Windows? The commands I found online are not working for example when I send the command "git rm -r --cached __pycache__" I get the command "pathspec '__pycache__' did not match any files".

@MarcusMüller it's `__pycache__`, not `.pycache`. The OP didn't properly formatted the text so the double underscores changed text to be bold. — phd, Nov 16 '22 at 14:37
@phd Even after I added .pycache. I get the response "fatal: pathspec '.pycache' did not match any files" — dit, Nov 16 '22 at 14:38
ah if there's actually double underscores here, dit: good news, the `__pycache__` directory isn't part of your git repo. So you've got nothing to remove from the index. — Marcus Müller, Nov 16 '22 at 15:15
@MarcusMüller I wish it was true but __pycache__ folders are there when I search. However, I am not sure what you mean by __pycache__ directory. — dit, Nov 16 '22 at 15:28
they're there, but not in the git index; they're *untracked*. So it's not `git rm`'s job to delete them. I've answered that before: — Marcus Müller, Nov 16 '22 at 15:30
Does this answer your question? [Git Shell: How can we remove specific file from untracked files](https://stackoverflow.com/questions/38564613/git-shell-how-can-we-remove-specific-file-from-untracked-files) — Marcus Müller, Nov 16 '22 at 15:31
The terms "folder" and "directory" are used interchangeably these days. `__pycache__` is a directory; `__pycache__` is a folder: these mean the same thing. Python creates `__pycache__` directories / folders when loading `.py` files, so that future loading of the same `.py` files goes faster. — torek, Nov 22 '22 at 05:18

score 1 · Answer 1 · answered Nov 22 '22 at 06:41

The __pycache__ folders that you are seeing are not in your current and future Git commits. Because of the way Git works internally—which Git forces you to know, at least if you're going to understand it—understanding this is a bit tricky, even once we get past the "directory / folder confusion" we saw in your comments.

The right place to start, I believe, is at the top. Git isn't about files (or even files-and-folders / files-and-directories). Those new to Git see it as storing files, so they think it's about files, but that's just not true. Or, they note the importance of the ideas behind branches, and think that Git is about branches, and that too is not really true, because people confuse one kind of "branch" (that does matter) with branch names (which don't matter). The first thing to know, then, is that Git is really all about commits.

This means that you really need to know:

what a commit is, and
what a commit does for you

(these two overlap but are not identical). We won't really cover what a commit is here, for space reasons, but let's look at the main thing that one does for you: Each commit stores a full snapshot of every file.

We now need a small digression into files and folders and how Git and your OS differ in terms of how they organize files. Your computer insists that a file has a name like file.ext and lives in a folder or directory—the two terms are interchangeable—such as to, which in turn lives in another folder such as path. This produces path/to/file.ext or, on Windows, path\to\file.ext.

Git, by contrast, has only files, and their names always use forward slashes and include the slashes. The file named path/to/file.ext is literally just the file, with that name. But Git does understand that your computer demands the file-in-folder format, and will convert back and forth as needed. If Git needs to extract a file whose name is some/long/file/name.ext, Git will create folders some, some/long, and so on when it must, all automatically.

The strange side effect of this is that because Git stores only the files, not the folders, Git is unable to store an empty folder. This distinction actually occurs in Git's index aka staging area, which we won't get into in any detail, but it explains the problem whose answers are given in How do I add an empty directory to a Git repository?

In any case, commits in Git store files, using these path names. Each commit has a full copy of every file—but the files' contents are stored in a special, Git-ized, read-only, Git-only format in which the contents are de-duplicated. So if a million commits store one particular version of one particular file, there's really only one copy, shared between all million commits. Git can do this kind of sharing because, unlike regular files on your computer, files stored in a commit, in Git, literally can't be changed.

Going back to the commits now: each commit contains a full snapshot of every file (that it had when you, or whoever, made the commit). But these files are read-only—they literally can't have their contents replaced, which is what enables that sharing—and only Git itself can even read them. This makes them useless for actually getting any work done. They're fine as archives, but no good for real work.

The solution to this problem is simple (and the same as in almost all other version control systems): when you select some commit to work on / with, Git will extract the files from that commit. This creates ordinary files, in ordinary folders, in an ordinary area in which you can do your work (whether that's ordinary or substandard or exemplary work—that's all up to you, not to Git ). What this means is that you do your work in a working tree or work-tree (Git uses these two terms interchangeably). More importantly, it means this: The files you see and work on / with are not in Git. They may have just been extracted by Git, from some commit. But now they're ordinary files and you use them without Git being aware of what you're doing.

Since Git has extracted these files into ordinary folders, you can create new files and/or new folders if you like. When you run Python programs, Python itself will, at various times, create __pycache__ folders and stuff *.pyc and/or *.pyo files into them. Python does this without Git's knowledge or understanding.

Because these files are generated by Python, based on your source, and just used to speed up Python, it's a good idea to avoid putting them into the commits. There's no need to save a permanent snapshot of these files, especially since the format and contents may depend on the specific Python version (e.g., Python 3.7 generates *.cpython-37.pyc files, Python 3.9 generates *.cpython-39.pyc files, and so on). So we tell Git two things:

Don't complain about the existence of these particular untracked files in the working tree.
When I use an en-masse "add everything" operation like git add ., don't add these files to the index / staging-area, so that they won't go into the next commit either.

We generally do this with the (poorly named) .gitignore file. Listing a file name in a .gitignore does not make Git ignore it; instead, it has the effect of doing the two things I listed here.

This uses the Git-specific term untracked file, which has a simple definition that has a complex back-story. An untracked file is simply any file in your working tree that is not currently in Git's index (staging area). Since we're not going to get into a discussion of Git's index here, we have to stop there for now, but the general idea is that we don't allow the __pycache__ files to get into the index, which keeps them untracked, which keeps Git from committing them, which keeps them from getting into Git's index. It's all a bit circular here, and if you accidentally do get these files into Git's index, that's when you need the git rm -r --cached __pycache__ command.

Since that command is failing, it means you don't have the problem this command is meant to solve. That's good!

score 0 · Answer 2 · answered Nov 22 '22 at 06:49

0

Well, you don't need __pycache__ files in your git repositories and you'd better to ignore all related files to it by adding __pycache__/ to your .gitignore file.

answered Nov 22 '22 at 06:49

Javad

2,033
3
13
23

Removing pycache in git

2 Answers2