7

I'm exploring some bioinformatics data and I like to use R notebooks (i.e. Rmarkdown) when I can. Right now, I need to use a command line tool to analyze a VCF file and I would like to do it through a Bash code chunk in the Rmarkdown notebook.

The problem is that the command I want to use was installed with conda into my conda environment. The tool is bcftools. When I try to access this command, I get this error (code chunk commented out to show rmarkdown code chunk format):

#```{bash}
bcftools view -H test.vcf.gz
#```
/var/folders/9l/phf62p1s0cxgnzp4hgl7hy8h0000gn/T/RtmplzEvEh/chunk-code-6869322acde0.txt: line 3: bcftools: command not found

Whereas if I run from Terminal, I get output (using conda environment called "binfo"):

> bcftools view -H test.vcf.gz | head -n 3
chr10   78484538    .   A   C   .   PASS    DP=57;SOMATIC;SS=2;SSC=16;GPV=1;SPV=0.024109    GT:GQ:DP:RD:AD:FREQ:DP4 0/0:.:34:33:0:0%:0,33,0,0   0/1:.:23:19:4:17.39%:1,18,0,4
chr12   4333138 .   G   T   .   PASS    DP=119;SOMATIC;SS=2;SSC=14;GPV=1;SPV=0.034921   GT:GQ:DP:RD:AD:FREQ:DP4 0/0:.:72:71:1:1.39%:71,0,1,0    0/1:.:47:42:5:10.64%:42,0,5,0
chr15   75086860    .   C   T   .   PASS    DP=28;SOMATIC;SS=2;SSC=18;GPV=1;SPV=0.013095    GT:GQ:DP:RD:AD:FREQ:DP4 0/0:.:15:15:0:0%:4,11,0,0   0/1:.:13:8:5:38.46%:5,3,1,4
(binfo)

So, how do I access tools installed with conda/in my conda env from an R notebook/Rmarkdown bash code chunk? I searched for quite a while and could not find anyone talking about running conda commands in a shell chunk in Rmarkdown. Any help would be appreciated because I like the R notebook format for exploratory analysis.

Leo
  • 187
  • 1
  • 9

2 Answers2

7

Passing Arguments to Engines

If your Conda is properly configured to work in bash, then you can use engine.opts to tell bash to launch in login mode (i.e., source your .bash_profile (Mac) or .bashrc (Linux)):

bash

```{bash engine.opts='-l'}
bcftools view -H test.vcf.gz
```

zsh

If working with zsh (e.g., Mac OS 10.15 Catalina users), then the interactive flag, --interactive|-i is what you want (Credit: @Leo).

```{zsh engine.opts='-i'}
bcftools view -H test.vcf.gz
```

Again, this presumes you've previously run conda init zsh to set up Conda to work with the shell.

Note on Reproducibility

Since reproducibility is usually a concern in scientific work, I will add that you may want to do something to capture the state of your Conda environment. For example, if you are working in version control, then commit a conda env export > environment.yaml. Another option would be to output that info directly at the end of the Rmd, like what is usually done with sessionInfo(). That is,

```{bash engine.opts='-l', comment=NA}
conda env export
```

where the comment=NA is so that the output can be cleanly copied from the rendered version.

merv
  • 67,214
  • 13
  • 180
  • 245
  • I actually could not get this to work with `zsh`. My chunk header is: `{zsh, engine.opts='-l'}`. I get the same error I was running into originally: `/var/folders/dr/tx09yq8n7s9b2mnlyh5lhxvr0000gn/T/RtmpxVYtGE/chunk-code-c6e560c02889.txt:2: command not found: bcftools`. The only way I've figure out how to use zsh is to add a `source ~/.zshrc` line at the start of the chunk so it can find the `bcftools` binary in `/Users//anaconda3/bin/bcftools`. Is there an equivalent way for zsh to auto source `~/.zshrc` like Bash does `~/.bash_profile` with `'-l'`? – Leo Oct 18 '19 at 20:12
  • 2
    I got it to work by switching the engine options from login (`{zsh, engine.opts='-l'}`) to interactive (`{zsh, engine.opts='-i'}`). Likely due to the differences in how Zsh sources dotfiles compared to Bash. – Leo Oct 18 '19 at 20:34
  • @Leo thanks for the note. I updated the answer with that info. – merv Oct 18 '19 at 20:47
0

Quick solution for bash: prepend the following init script into your Bash scripts.

eval "$(command conda 'shell.bash' 'hook' 2> /dev/null)"

# you may need to activate the "base" environment explicitly
conda activate base

Detail

When you open your terminal, an interactive shell is spawned. But your script is run in a non-interactive shell. Bash configuration file ~/.bashrc will not be used for the scripts, which skips the conda initialization and your "base" environment is not exposed into PATH.

References

Simba
  • 23,537
  • 7
  • 64
  • 76
  • I tried this but I'm still getting an error when even trying to access the `conda activate` command to activate my env: `# Test Bash chunk ```{bash} eval "$(command conda 'shell.bash' 'hook' 2> /dev/null)" conda activate binfo bcftools ``` /var/folders/9l/phf62p1s0cxgnzp4hgl7hy8h0000gn/T/Rtmpcy5EuN/chunk-code-80345e779e.txt: line 2: conda: command not found /var/folders/9l/phf62p1s0cxgnzp4hgl7hy8h0000gn/T/Rtmpcy5EuN/chunk-code-80345e779e.txt: line 3: bcftools: command not found ` – Leo Sep 27 '19 at 16:38