1

I am running Snakemake with the --use-conda option. Snakemake successfully creates the environment, which should include pysam. I am able to manually activate this created environment, and within it, run my script split_strands.py, which imports the module pysam, with no problems. However, when running the Snakemake pipeline, I get the following error log:

Activating conda environment: /projects/ps-yeolab3/ekofman/sc_STAMP_pipeline/STAMP/workflow/.snakemake/conda/7c375b6b
/projects/ps-yeolab3/ekofman/sc_STAMP_pipeline/STAMP/workflow/scripts/split_strands.py:166: SyntaxWarning: "is not" with a literal. Did you mean "!="?
  if args.output_fwd_bam is not '-':
/projects/ps-yeolab3/ekofman/sc_STAMP_pipeline/STAMP/workflow/scripts/split_strands.py:171: SyntaxWarning: "is not" with a literal. Did you mean "!="?
  if args.output_rev_bam is not '-':
Traceback (most recent call last):
  File "/projects/ps-yeolab3/ekofman/sc_STAMP_pipeline/STAMP/workflow/scripts/split_strands.py", line 20, in <module>
    import pysam
ModuleNotFoundError: No module named 'pysam'
[Mon Mar 29 16:41:06 2021]
Error in rule split_strands:
    jobid: 0
    output: 1_split_strands/TWA1_possorted_genome_bam_MD-GTCGCGACACGAGGTA-1.bam.fwd.bam, 1_split_strands/TWA1_possorted_genome_bam_MD-GTCGCGACACGAGGTA-1.bam.rev.bam
    conda-env: /projects/ps-yeolab3/ekofman/sc_STAMP_pipeline/STAMP/workflow/.snakemake/conda/7c375b6b
    shell:
        
        python scripts/split_strands.py -i /projects/ps-yeolab3/ekofman/sc_STAMP_pipeline/STAMP/workflow/inputs/TWA1_possorted_genome_bam_MD-GTCGCGACACGAGGTA-1.bam -f 1_split_strands/TWA1_possorted_genome_bam_MD-GTCGCGACACGAGGTA-1.bam.fwd.bam -r 1_split_strands/TWA1_possorted_genome_bam_MD-GTCGCGACACGAGGTA-1.bam.rev.bam
        
        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Nodes:        tscc-1-37

So as you can see, though it says it is "Activating conda environment", this does not seem to be true as subsequently the module 'pysam' is not found, which I've verified would be found when activating manually.

This is how the rule is specified:

rule split_strands:
    input: 
        input_bam=config["samples_path"]+"{sample}",
        index=config["samples_path"]+"{sample}.bai"
    output: 
        output_fwd="1_split_strands/{sample}.fwd.bam",
        output_rev="1_split_strands/{sample}.rev.bam"
    conda:
        "envs/python2.7.yaml"
    shell:
        """
        python scripts/split_strands.py -i {input.input_bam} -f {output.output_fwd} -r {output.output_rev}
        """

I have verified that the hash 7c375b6b corresponds to the appropriate env specified in python2.7.yaml.

Any ideas what might be happening? My rules are being run a cluster and submitted via qsub commands.

merv
  • 67,214
  • 13
  • 180
  • 245
ekofman
  • 299
  • 3
  • 12
  • Try writing a debugging rule that also uses the env, and output some stuff to a file that you can check. E.g., run a shell command like `python -c 'import sys; print(sys.prefix); print(sys.path)' > env_debug.out`. This can verify what Python is loaded and where it is searching for packages. – merv Mar 30 '21 at 03:34
  • @merv Indeed, I now see in env_debug.out: /home/ekofman/new_anaconda3/envs/snakemake ['', '/home/ekofman/new_anaconda3/envs/snakemake/lib/python39.zip', '/home/ekofman/new_anaconda3/envs/snakemake/lib/python3.9', '/home/ekofman/new_anaconda3/envs/snakemake/lib/python3.9/lib-dynload', '/home/ekofman/new_anaconda3/envs/snakemake/lib/python3.9/site-packages'] None of which are the intended conda env in which the code should be executing. This is the original environment (snakemake) in which I am originally launching snakemake. – ekofman Mar 30 '21 at 04:25
  • When I just added all the requisite libraries to my original environment in which I launched the snakemake command, everything works fine. So it does seem like there is an issue with activating the conda envs from within a snakemake rule. – ekofman Mar 30 '21 at 05:27
  • It’s kinda strange that it is still using your **snakemake** environment, which is not your **base** env. Do you have that activating by default somehow, e.g., through your `~/.bashrc`? Are you using a cluster profile? – merv Mar 30 '21 at 17:51
  • oh I activated that manually prior to running – ekofman Mar 30 '21 at 22:21
  • I have had similar issue when using multiqc with snakemake's `--use-conda`. Snakemake version used here was v5.9.1. Error had to do with `PYTHONPATH` as [discussed here](https://stackoverflow.com/q/17386880/3998252). Using `unset PYTHONPATH` in `shell` right before multiqc commands solved the issue but as discussed in comments over there, I am not sold on if it is an ideal solution. – Manavalan Gajapathy Mar 31 '21 at 17:31
  • @ManavalanGajapathy PYTHONPATH is incompatible with isolation of Python virtual environments. If you work with Conda envs, your default should be to always have PYTHONPATH clear, and only ever use it when you want to deliberately violate isolation. – merv Apr 05 '21 at 23:02

1 Answers1

0

Turns out that the newer version of snakemake 6.0.0+ must have some issue with this. I used snakemake 5.8.2 instead and things work just fine. Not sure exactly what's going on under the hood but seems identical to this issue: https://github.com/snakemake/snakemake/issues/883

ekofman
  • 299
  • 3
  • 12
  • Are you also manually writing a `qsub` like in that GitHub issue? I use Snakemake+Conda on both LSF and SLURM systems (using [cluster profiles](https://snakemake.readthedocs.io/en/stable/executing/cli.html#profiles)) and haven't had any issues from 5.10-5.31 (haven't updated yet to later versions). – merv Mar 30 '21 at 22:32