41

I ran this command to release disk space on anaconda

$ conda clean --all

However, there are still some big files that remain in pkgs folder in anaconda python.

Is it safe to manually delete all the files in pkgs folder? Any risk of corrupting my anaconda environment? What are some side effects, if any?

I am using anaconda 2018 on windows 10.

GalDude33
  • 7,071
  • 1
  • 28
  • 38
guagay_wk
  • 26,337
  • 54
  • 186
  • 295
  • 1
    Speaking from personal experience, I have deleted all files in the pkgs folder and have not encountered any problems. However, if you are in doubt, just leave the files alone. I'm sharing my own experience here and cannot guarantee you will not encounter problems if you do the same. I am using Windows 10. – user3848207 Oct 01 '19 at 12:10
  • 1
    Just wanted to note that I have also deleted all the files in pkgs, with no noticeable issues. This is on a cluster running some kind of Debian, with only one conda env in use. It absolutely does free up space - we have strict quota limits and it's very useful – Clumsy cat Jun 02 '20 at 06:45
  • 2
    @Clumsy cat, I did it many times. No problem at all. – guagay_wk Jun 02 '20 at 10:08
  • Why does this folder have 4 different python subfolders, when I only have two environments? `python-3.9.15-h6244533_0` `python-3.10.6-hbb2ffb3_0` `python-3.9.13-h6244533_2` `python-3.10.4-hbb2ffb3_0` One environment uses `python 3.9.15` and the other uses `python 3.10.6` so why is there still `3.10.4` or `3.9.13`? – endolith Nov 17 '22 at 16:52

2 Answers2

47

Actually, under certain conditions it is an option to have the pkgs subdirs removed. As stated here by Anaconda Community Support "the pkgs directory is only a cache. You can remove it completely is you want to. However, when creating new environments, it is more efficient to leave whatever packages are in the cache around."

According to the documentation you can use conda clean --packages to remove unused packages in pkgs (which will move them to pkgs/.trash from which you can then safely delete them). While this does not check for packages installed using symlinks back to the package cache, this is not a topic if you don't use such environments or work under Windows. I guess that's why conda clean --packages is included in conda clean --all.

To more aggressively save space you can use conda clean --force-pkgs-dirs to remove all writable package caches (with the same caveat that there could be environments linked to these dirs). If you don't use environments or use Anaconda under Windows, you're probably safe. Personally, I use this option without issues.

user343233
  • 99
  • 6
Robert
  • 1,357
  • 15
  • 26
8

Edit Commentary

After reviewing the documentation pointed out in @Robert's answer, I must admit my initial response was overly alarmist and, in parts, blatantly incorrect. My apologies for the misleading response.

Nevertheless, I do believe some of what I raised still has some merit for this thread, and so I am deciding to retain the answer with amendments. In particular, I think it worth emphasizing that deleting the pkgs directory may not actually achieve what OP was hoping for (to save space) and that removing the package cache undermines Conda's redundancy minimization strategy going forward by making it impossible to share already installed packages.

Instead, my final recommendation concurs with what @Robert suggested, namely, use conda clean -p to delete unused packages, but keep the cache (pkgs dir) so that future environments can still leverage hardlinks. One last point to note, is that some tools, such as conda-pack, rely on the integrity of the package cache in order work, so deleting pkgs will prevent their use.


Amended Original Response

No, it is definitely not safe, and in fact the only way you would actually free disk space is if you broke your base env. The issue is that all envs use hardlinks to the pkgs directory, so even if you delete the link located in the pkgs directory, the ones in the envs will still be there and so you won't delete any physical files on the disk. The only real deletion you might do is something that is only referenced by base, i.e., the only copy is in pkgs, hence the potential for a breaking base.

Correction: The base env still links packages to other locations, so deleting pkgs will not impact base as I originally concluded.

I'd highly recommend looking at this other post on estimating the real disk usage of Conda. You may be overestimating how much space is really being used. For most files in pkgs, there is only one physical copy, so there isn't any additional manual optimization to be done.

merv
  • 67,214
  • 13
  • 180
  • 245
  • 2
    Oh dear. I just deleted all the files in pkgs folder. So far, I can still use conda python normally without hiccups. I suspect this is because I do not use virtual environments. I only have 1 single conda base environment. So, deleting the files shouldn't matter if I only have one base environment? – guagay_wk May 23 '19 at 03:08
  • 1
    @user781486 interesting to see if everything still works. Perhaps **base** also has other links elsewhere of which I am unaware. If you really want to save space, consider [migrating to Miniconda](https://stackoverflow.com/q/56050217/570918). In your case, you probably don't even want to export a YAML, since it sounds like you only have Anaconda. You could just list the packages you actually use, and start making new envs. – merv May 23 '19 at 03:13
  • https://groups.google.com/a/continuum.io/forum/#!topic/anaconda/xV1BiGPmgao Seems like ok but don't quite understand the downside. Can you help? "conda was designed in a way that the 'pkgs' directory is a download cache, as well as a place where the downloaded conda packages get extracted. The files from the extracted tarballs then get (hard) linked into conda environments. So while removing the entire 'pkgs' directory is possible, the downside is that when you create new environments, the shared files from existing packages (in other environments) will no longer share the hard links. " – guagay_wk May 23 '19 at 03:19
  • No longer share the hard links mean when you create new env in future, the library files will be re-downloaded? So, future virtual env will result in huge disk space increase, is it? – guagay_wk May 23 '19 at 03:21
  • 2
    I think even the base environment hard links the files from the pkgs directory into the appropriate location depending on your OS. Hence, you didn't run into any problems. – darthbith May 23 '19 at 03:22
  • @user781486 "*when you create new env in future, the library files will be re-dowloaded*" That's what I'd expect, but I don't know if Conda has an alternate mechanism for keeping track of what is physically there, outside of checking what is in `pkgs`. You could easily test this. Try creating a new env, and make one of your existing packages (including build specification) a requirement. Does it attempt a redownload? – merv May 23 '19 at 03:28
  • @darthbith which also means that the OP didn't save any space after all :| – merv May 23 '19 at 03:34
  • If I don't create new virtual environment, I still get to save space. – guagay_wk May 23 '19 at 03:37
  • 2
    @user781486 I doubt you have saved any physical space, but instead have only created the appearance of doing so. I still strongly believe that if you did save space, you likely broke something. You've also potentially created a situation where Conda may no longer be able to reuse the packages you currently have when creating new envs. But you'd need to run something like the test I suggested to verify that. – merv May 23 '19 at 03:48
  • 2
    @user781486 after reviewing what was posted in [this answer](https://stackoverflow.com/a/64005961/570918), I've decided to edit my answer. I think it would be appropriate for you to change the accepted answer to that one instead of mine. Fortunately, I think I erred on the side of caution, but I still must fess up: I was wrong and I apologize for misleading you. – merv Dec 03 '20 at 19:42