3

I want to create VennDiagramms with pybedtools. There is a special script using matplotlib called venn_mpl. It works perfectly when I try it out in my jupyter notebook. You can do it with python or using shell commands.

Unfortunately something wents wrong when I want to use it in my snakefile and I can’t really figure out what the problem is.

First of all, this is the script: venn_mpl.py

#!/gnu/store/3w3nz0h93h7jif9d9c3hdfyimgkpx1a4-python-wrapper-3.7.0/bin/python
"""
Given 3 files, creates a 3-way Venn diagram of intersections using matplotlib; \
see :mod:`pybedtools.contrib.venn_maker` for more flexibility.

Numbers are placed on the diagram.  If you don't have matplotlib installed.
try venn_gchart.py to use the Google Chart API instead.

The values in the diagram assume:

    * unstranded intersections
    * no features that are nested inside larger features
"""

import argparse
import sys
import os
import pybedtools

def venn_mpl(a, b, c, colors=None, outfn='out.png', labels=None):
    """
    *a*, *b*, and *c* are filenames to BED-like files.

    *colors* is a list of matplotlib colors for the Venn diagram circles.

    *outfn* is the resulting output file.  This is passed directly to
    fig.savefig(), so you can supply extensions of .png, .pdf, or whatever your
    matplotlib installation supports.

    *labels* is a list of labels to use for each of the files; by default the
    labels are ['a','b','c']
    """
    try:
        import matplotlib.pyplot as plt
        from matplotlib.patches import Circle
    except ImportError:
        sys.stderr.write('matplotlib is required to make a Venn diagram with %s\n' % os.path.basename(sys.argv[0]))
        sys.exit(1)

    a = pybedtools.BedTool(a)
    b = pybedtools.BedTool(b)
    c = pybedtools.BedTool(c)

    if colors is None:
        colors = ['r','b','g']

    radius = 6.0
    center = 0.0
    offset = radius / 2

    if labels is None:
        labels = ['a','b','c']

Then my code:

rule venndiagramm_data:
     input:
         data = expand("bed_files/{sample}_peaks.narrowPeak", sample=config["samples"]["data"])
     output:
         "figures/Venn_PR1_PR2_GUI_data.png"
     run:
         col = ['g','k','b']
         lab = ['PR1_data','PR2_data','GUI_data']
         venn_mpl(input.data[0], input.data[1], input.data[2], colors = col, labels = lab, outfn = output)

The error is:

SystemExit in line 62 of snakemake_generatingVennDiagramm.py:
1

The snakemake-log only gives me:

rule venndiagramm_data:
    input: bed_files/A_peaks.narrowPeak,bed_files/B_peaks.narrowPeak, bed_files/C_peaks.narrowPeak
    output: figures/Venn_PR1_PR2_GUI_data.png
    jobid: 2

Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message

I already tried to add as written in the documentation:

rule error:
  shell:
    """
    set +e
    somecommand ...
    exitcode=$?
    if [ $exitcode -eq 1 ]
    then
        exit 1
    else
        exit 0
    fi
    """

but this changed nothing.

Then my next idea was to just do it while using the shell command which I also tested before and which worked perfectly. But then I got a different but I think quite similar error message for which I didn’t found a proper solution too:

rule venndiagramm_data_shell:
    input:
        data = expand("bed_files/{sample}_peaks.narrowPeak", sample=config["samples"]["data"])
    output:
        "figures/Venn_PR1_PR2_GUI_data.png"
    shell:
        "venn_mpl.py -a {input.data[0]} -b {input.data[1]} -c {input.data[2]} --color 'g,k,b' --labels 'PR1_data,PR2_data,GUI_data'"
The snakemake log:

[Thu May 23 16:37:27 2019]
rule venndiagramm_data_shell:
    input: bed_files/A_peaks.narrowPeak, bed_files/B_peaks.narrowPeak, bed_files/C_peaks.narrowPeak
    output: figures/Venn_PR1_PR2_GUI_data.png
    jobid: 1

[Thu May 23 16:37:29 2019]
Error in rule venndiagramm_data_shell:
    jobid: 1
    output: figures/Venn_PR1_PR2_GUI_data.png

RuleException:
CalledProcessError in line 45 of snakemake_generatingVennDiagramm.py:
Command ' set -euo pipefail;  venn_mpl.py -a input.data[0] -b input.data[1] -c input.data[2] --color 'g,k,b' --labels 'PR1_data,PR2_data,GUI_data' ' returned non-zero exit status 1.

Does anyone has an idea what could be the reason for this and how to fix it?

FYI: I said that I tested it, without running it with snakemake. This is my working code:

from snakemake.io import expand
import yaml
import pybedtools
from pybedtools.scripts.venn_mpl import venn_mpl

config_text_real = """ 
samples:
    data:
    - A
    - B
    - C
    control:
    - A_input 
    - B_input
    - C_input
"""
config_vennDiagramm = yaml.load(config_text_real)
config = config_vennDiagramm

data = expand("{sample}_peaks.narrowPeak", sample=config["samples"]["data"])
col = ['g','k','b']
lab = ['PR1_data','PR2_data','GUI_data']
venn_mpl(data[0], data[1], data[2], colors = col, labels = lab, outfn = 'Venn_PR1_PR2_GUI_data.png')

control = expand("{sample}_peaks.narrowPeak", sample=config["samples"]["control"])
lab = ['PR1_control','PR2_control','GUI_control']
venn_mpl(control[0], control[1], control[2], colors = col, labels = lab, outfn = 'Venn_PR1_PR2_GUI_control.png')

and within my jupyter notebook for shell:

!A='../path/to/file/A_peaks.narrowPeak'
!B='../path/to/file/B_peaks.narrowPeak'
!C='../path/to/file/C_peaks.narrowPeak'
!col=g,k,b
!lab='PR1_data, PR2_data, GUI_data'
!venn_mpl.py -a ../path/to/file/A_peaks.narrowPeak -b ../path/to/file/B_peaks.narrowPeak -c ../path/to/file/C_peaks.narrowPeak --color "g,k,b" --labels "PR1_data, PR2_data, GUI_data"

The reason why I used the full path instead of the variable is, because for some reason the code didn't worked with calling the variable with "$A" .

VcFbnne
  • 41
  • 5
  • Please provide more data. First, looks like the problem is in the script venn_mpl.py, not the Snakefile itself. Without seeing the code of this script we could say nothing concrete. Next, there could be a problem if the number of samples in your config file differs from the number the script expects. There should be at least 3 files, but is that true? `input.data[0], input.data[1], input.data[2]` – Dmitry Kuzminov May 24 '19 at 19:57
  • The number of input files is correct. And as seen in the error message, every file is also correctly interreted – VcFbnne May 27 '19 at 12:31
  • I edited the post and provided the code of the script at the top. I didn't wrote it myself. It's part of the pyBedTools package. – VcFbnne May 27 '19 at 13:06

1 Answers1

2

Not sure if this fixes it, but one thing I notice is that:

shell:
    "venn_mpl.py -a input.data[0] -b input.data[1] -c input.data[2]..." 

probably should be:

shell:
    "venn_mpl.py -a {input.data[0]} -b {input.data[1]} -c {input.data[2]}..." 
dariober
  • 8,240
  • 3
  • 30
  • 47
  • thanks! Ideed this was an issue with the shell version. But there is still something else. But unfortunately it still gives me the same error. – VcFbnne May 27 '19 at 12:27
  • I updated the post and gave you the code wich worked for me outside of snakemake – VcFbnne May 27 '19 at 13:15
  • 1
    One more thing... Your shell command doesn't include the output file. It should be: `venn_mpl.py -o {output} -a {input.data[0]} ...`. As it is, it would make snakemake fail since the requested output `figures/Venn_PR1_PR2_GUI_data.png` is not produced. But this is unlikely to be the cause of your problem. Can you show the snakemake command you executed? Also, add to it the `-p/--printshellcmds` so you see exactly what snakemake is executing, copy and paste the `venn_mpl.py ...` command to the sheel and check for errors and exit code (with `echo $?` immediately after it has done) – dariober May 27 '19 at 13:44
  • thanks a lot for the tip of using -p. This helps so much! Well, in this case it gave me "matplotlib is required to make a Venn diagram with .venn_mpl.py-real". Which is strange for two reasons: at the begin of the snakemake file i have "import matplotlib" and second, when I run this as a shell command in my jupyter notebook it works. For executing I use the terminal from the jupyter notebook and "snakemake -s snakefile.py" – VcFbnne May 27 '19 at 17:10
  • _matplotlib is required ..._ It may be that the python version that executes `venn_mpl.py` (i.e. the one in the [shebang](https://en.wikipedia.org/wiki/Shebang_(Unix)) line) is different from the one you have on your PATH and it doesn't have matplotlib installed. Try `python /path/to/venn_mpl.py ...` and/or `python3 ...`. Check also which python executable runs jupiter and use that one. (Just guessing...) – dariober May 27 '19 at 20:25