conda clean --packages
removes unused packages from writable package caches. What is this 'writable package cache', and how is conda able to detect that it's unused?
Is it actually running through all of the python files and looking for dependencies? Or does it keep a record of what has run before?
Does it ever remove packages that I installed via pip but never used?

- 354
- 3
- 15
-
1If you install A, and A needs B and C, they will be installed. If you now remove A, B and C are not automatically removed because other packages might want them. "Clean" removes those. – Tim Roberts Feb 03 '22 at 17:14
-
@TimRoberts this is murky territory. I recall a time when Conda used *prune* unused dependencies, and one can find questions on SO with users asking how to prevent it. As in, “I installed package A, which had B as a dependency, but now I want to remove A, but keep B. How can this be done?” Unfortunately, at some point this pruning stopped working. I’m unsure what the status is now. But more importantly you’re example isn’t true - if B is still installed in an environment it won’t be removed by `conda clean`. Only A would get removed from the cache. – merv Feb 03 '22 at 21:46
1 Answers
Conda counts hardlinks
Conda uses hardlinks to minimize physical disk usage. That is, a single physical copy of lib/libz.a
may be referenced from the package cache (where it was first unpacked), and then in multiple environments.
Conda determines eligibility for removing a package from the package cache by counting the number of hardlinks for the files in each package. Hardlink counts are tracked by the filesystem, not by Conda. An outline of the relevant code is:
# keep a list of packages to remove
pkgs_to_remove = []
# look in all package caches (there can be multiple)
for pkg_cache in pkgs_dirs:
# check all packages
for pkg in pkg_cache:
# assume removable unless...
remove_pkg = True
for file in pkg:
# is there evidence that it is linked elsewhere?
if num_links(file) > 1:
# if so, don't remove, and move on
remove_pkg = False
break
# add it to list is removable
if remove_pkg:
pkgs_to_remove.append(pkg)
# output some info on `pkgs_to_remove`
# check if user wants to execute removal
That is, if any file in a package has more than one link, then Conda will conclude it is used in another environment, and move on to the next package.
Note that filesystems don't keep track of symbolic links (a.k.a., symlinks, softlinks), and Conda doesn't track them, hence, Conda warns about cleaning packages in combination with the allow_softlinks
setting.

- 67,214
- 13
- 180
- 245
-
Suppose there's only one hardlink, then, because I only installed the package for use in the current environment (not base). Would `conda clean --packages` remove it? – Jaden Lorenc Feb 04 '22 at 00:13
-
1@JadenLorenc *”Suppose there is only one hardlink…*” - the only way that is true is if the package does not have any hardlinks to environments. That is, it was downloaded via `conda install --download-only` or it was installed to an environment then removed (`conda install`, `conda remove`). If it is still installed in the environment it won’t be removed. – merv Feb 04 '22 at 03:07
-
@JadenLorenc I should note that not all files in a package can be hardlinked, so it could be possible that a package could have all files *copied*. In such a case, the package would be removed from the *cache*, but since it was copied, the environment would not be impacted. This is all *in theory* - i.e., I only know this is possible, but I don't know a concrete example. – merv Feb 04 '22 at 05:18