2

So, as the title suggested

I have run conda --clone base --name newenv

when it's finished, I checked the size of my newenv folder, and it's 3.78 GB. Why is it taking up space at all? If conda is using some pointer references to the base packages, and I have not installed any new package, how come it is still taking ~4GB? This seems a very pointless use of space. Is there any way I can reduce this? Thanks

Vaaal88
  • 591
  • 1
  • 7
  • 25
  • how do I check that? – Vaaal88 Jan 20 '22 at 17:07
  • 1
    While the duplicate explains that disk usage of a cloned environment is *usually* an illusion, there are cases where hardlinks aren't used, such as cloning across volumes or if someone literally disables linking by setting `always_copy: true`. – merv Jan 20 '22 at 19:15

1 Answers1

1

Actually, Conda already kinda does share the env spaces. However, because it leverages hardlinks, it is easy to overestimate the space really being used. (read more)

in any case, the answer to your question might lie in the difference between Anaconda & Miniconda. Anaconda is about 2GB, while Miniconda is closer to 100MB.

Anaconda includes a long list of packages that get installed automatically into each environment that you create.

Miniconda creates barebone conda virtual environments (which don't contain many packages at all). Switching to Miniconda should substantially reduce the size/number of packages in your environments.

Conda also uses hardlinks for packages installed vs conda install. A good description of hardlinks can be found here. They basically link dependencies across multiple environments like you've described above. Packages installed via pip are not hardlinked, so they cannot take advantage of the space savings that conda packages offer.

Guy Nachshon
  • 2,411
  • 4
  • 16
  • Ah, yes, I believe this is the answer: https://stackoverflow.com/questions/55566419/why-are-packages-installed-rather-than-just-linked-to-a-specific-environment – juanpa.arrivillaga Jan 20 '22 at 17:39
  • yes, there's a link to this post in my answer :), but it is missing some information for other purposes – Guy Nachshon Jan 20 '22 at 17:42
  • this is REALLY informative, thanks ! I have an additional question: what happens if I installed a package _with pip_ in a conda environment, when this package already exists in another conda environment? Will it be duplicated? – Vaaal88 Jan 20 '22 at 22:05
  • I am thinking of two cases: `Conda activate base; pip install torch; conda activate env2; pip install torch` and `Conda activate base; conda install torch; conda activate env2; pip install torch` - - does it use hardlinks in any of these cases? – Vaaal88 Jan 21 '22 at 07:16
  • @Vaaal88 `pip` will never use hardlinks, but `conda` can. No reason to use `pip` to install `torch` though, as there is the official `pytorch` channel – FlyingTeller Jan 21 '22 at 08:18
  • I know, it was just the first package I could think of. So, to answer my question, pip will duplicate packages when installed in conda environments, right? – Vaaal88 Jan 21 '22 at 08:32
  • 1
    @Vaaal88 not only do PyPI packages get duplicated across environments, but it is becoming more common for PyPI packages to use static builds (using wheels) which means that individual packages are each bringing their own copy of the libraries they need. It makes installation faster, but at the cost disk space. Conda solves this outright by precompiling (fast installation) everything with dynamic linking, so all packages that require a library will share it. And the hardlinking means they even share libraries *across* environments. – merv Jan 21 '22 at 19:50
  • Thanks! Really interesting – Vaaal88 Jan 21 '22 at 22:13