5

I'm tearing my hair out here, hopefully someone can help me.

Running snakemake 4.8.0

I have a snakemake pipeline, which I run with two conda envs and --use-conda and it works fine when run as a standalone pipeline.

However, when I run on our cluster, I get the error:

"The 'conda' command is not available in $PATH."

Now. Anaconda is installed on our cluster, but we need to activate it on nodes with:

module load anaconda

Also, module is defined as a function, so I have source a couple of things first. Therefore, at the top of my snakefile, I have:

shell.prefix("source $HOME/.bashrc; source /etc/profile; module load anaconda; )

This doesn't solve the problem.

I even put module load anaconda in my .bashrc, and that still doesn't work. Only on cluster execution, I get the error about conda not being found.

Other changes to my .bashrc are picked up and are picked up by snakemake, so I have no idea why it is having problems with conda.

I even created a conda env, loaded snakemake and conda into that env, activate the env in the submission script and in the Snakefile:

shell.prefix("source $HOME/.bashrc; source /etc/profile; module load anaconda; source activate MAGpy-3.5; ")

And it still says "The 'conda' command is not available in $PATH."

Literally tearing my hair out.

As an aside, I submit using qsub -S /bin/bash and also use shell.executable("/bin/bash") but the temp shell scripts created in .snakemake are run by /bin/sh - is that expected?

Please help me!

  • Sorry, the above error message was from an old version of snakemake, what I get in 4.8.0 is "subprocess.CalledProcessError: Command 'which conda' returned non-zero exit status 1" – Mick Watson Apr 05 '18 at 20:31
  • Is the snakemake command running in a script that you `qsub`? The error seems to be coming from the process that creates (rather than activates) the conda environments, and this in turn seems to be run by snakemake itself, not by jobs spawned by snakemake. – Peter van Heusden Apr 05 '18 at 20:45
  • Yes it is, and that job is fine because in that job I "module load anaconda" then "source activate env" and then run snakemake. That job perfectly happily creates the two conda envs in the Snakefile (I am using --use-conda) but then the jobs my initial job submits fail with the subprocess.CalledProcessError: Command 'which conda' returned non-zero exit status 1 problem, which means it is the secondary jobs that cannot find conda, whereas the primary job can – Mick Watson Apr 05 '18 at 20:51
  • Can you put something in ~/.profile or ~/.bashrc to verify it's actually being called? Something like 'touch ~/bash_brickwall.txt' – gringer Apr 06 '18 at 06:18

4 Answers4

2

I always have to use:

set +u; {params.env}; set -u

(where {params.env} is loading up a conda command from my config.yaml)

when invoking a conda environment within the shell command of a Snakefile, because Snakemake is automatically prepending shell commands with set +u.

Not sure if this will fix your problem, but worth a spin?

Jon
  • 83
  • 4
  • Note that Snakemake can do the env activation for you: http://snakemake.readthedocs.io/en/stable/snakefiles/deployment.html#integrated-package-management – Johannes Köster Apr 06 '18 at 11:45
  • Thanks Johannes, I am using --use-conda, but it seems that on our nodes, snakemake cannot find conda and that's why it falls over – Mick Watson Apr 09 '18 at 13:17
  • For the record: On our SLURM-based cluster, I have to use `. /PATH/TO/miniconda3/etc/profile.d/conda.sh; set +u; conda activate ./LOCAL_ENV; set -u; BINARY_IN_LOCAL_ENV` as otherwise I get an error that my conda environment is not properly set up. – Shadow Apr 17 '19 at 09:43
2

You can provide a custom "jobscript template", have you tried that? The default one looks like this:

#!/bin/sh
# properties = {properties}
{exec_job}

So perhaps yours could look this like:

#!/bin/bash
# properties = {properties}
module add anaconda
{exec_job}

and then you refer to this file with the --jobscript parameter when you run snakemake.

P.S. if you look in the code the {exec_job} is filled in with a call to python -m snakemake without any PATH setting, which I think contributes to the error you are seeing.

1

What module does is generally nothing more than modifying PATH and other environment variables. This is also true for conda environments and source activate

As an example, on our cluster QIIME2 is installed in a conda environment, but its modulefile is

prepend-path    PATH            /opt/sw/qiime/2.2018.2/bin
prepend-path    PYTHONPATH      /opt/sw/qiime/2.2018.2/lib/python3.5/site-packages

while our conda modulefile is

prepend-path    PATH            /opt/sw/conda/3/bin

So assuming MAGpy-3.5 is your conda environment, you could

(a) make a module for your MAGpy pipeline and load it, ignoring that it is a conda environment or

(b) make snakemake run with a modified PATH (I do not know how snakemake deal with environment variables)

(c) add the path to your conda installation or your MAGpy installation in your .bashrc

Both (b) and (c) defeat the purpose of having a module system IMO, but I've found that anaconda itself is sort of redundant with modulefiles. In our cluster while we install some software with anaconda, we never make the user load them with source activate, and write modulefiles for those instead.

H. Gourlé
  • 884
  • 7
  • 15
0

I had the same problem and solved it by exporting a path in my submission file to the (hidden) conda directory, which is typically found in your home directory. For example

export PATH=/home/yourusername:$PATH
annajen
  • 1
  • 1