20

I'm running conda environments on a compute cluster where the total number of files per "project" is restricted (200k files max). I've only created a couple of conda environments (anaconda for Python 2.7; ~200 python & R packages installed in each environment; high package overlap between environments) and already hit that file number limit. Even when using conda clean -a only a small fraction of the files are removed. Some python packages in my conda environments (e.g., boost) contain >10k files, and clean does not reduce this.

Is there any way to greatly reduce the number of files stored as part of a conda environment?

sharchaea
  • 743
  • 1
  • 6
  • 16
  • 3
    Is it a requirement that you have all the anaconda packages? Installing anaconda includes 100s of packages. Do you really need all of those? Perhaps you can install miniconda. Or simply create a conda environment with just those packages you really need. – Paul Oct 25 '16 at 20:34
  • Yeah, I do need at least most of those packages. Actually, I haven't yet even added much of the bioinformatics software that I want to include in my conda environments. I don't see why conda needs to keep all of these files that are part of these package distributions. I'm surprised that others haven't had issues with the large number of files associated with conda environments. – sharchaea Oct 26 '16 at 20:03
  • So miniconda with only installing necessary packages does not help? – Jiren Jin Mar 05 '19 at 16:43
  • 1
    agree to use miniconda and add packages explicitly. If performance is not an issue, you can also tell the python interpreter to not generate bytecode (*.pyc) files. – booleys1012 Mar 13 '19 at 06:29
  • I would start by deleting the pkgs directory which holds the cache of the files downloaded – Vikramaditya Gaonkar Jan 30 '20 at 16:26

1 Answers1

2

Anaconda uses hard links to reduce the consumed disk space. But if a limit is imposed on the number of files, each hard link counts.

As discussed in the comments, using Miniconda instead of Anaconda, and installing only the packages you actually need, might help.

If this isn't enough, I'd recommend to merge several of your environments into one. Then you'll have fewer hardlinks for the packages that overlap. Of course that is the opposite of what environments are there for, but such is the nature of workarounds.

Roland Weber
  • 3,395
  • 12
  • 26
  • 4
    I'm surprised that more people haven't run into problems with the massive number of files that are associate with new conda envs. Even if we just use miniconda, and each user only has a couple of envs that they have created for themselves, 2 envs x 30 users x 5-10k files_per_env = 300k to 600k files! Currently, our miniconda install has ~1.8 million files in it, and that's after running `conda clean --all`. – sharchaea Jun 06 '19 at 11:33
  • Number of files is not an issue anymore in today's file systems. And user limits are typically enforced with quotas on the disk space consumed, not on the number of files. – Roland Weber Jun 07 '19 at 17:10
  • 3
    For full scans or copying of the file system, the number of files can substantially slow things down, especially if the number of files is in the millions. – sharchaea Jun 09 '19 at 05:07