2

I'm trying to get biotools working on my Mac so that I can run some Perl5 code that uses Bio::DB::Sam, but am stymied.

  • Mac OS X 10.10.4

  • perl 5.18.2

  • upgraded CPAN as per INSTALL instructions

  • 'brew install expat' tells me that expat-2.1.0_1 is already installed

  • 'sudo perl -MCPAN -e shell'

  • 'install CJFIELDS/BioPerl-1.6.924.tar.gz'

  • 'Do you want to run the Bio::DB::GFF or Bio::DB::SeqFeature::Store live database tests?' => 'n'

  • Install all

  • 'Do you want to run tests that require connection to servers across the internet' => 'n'

Eventually get (with some lines deleted):

Running Build test
t/Align/AlignStats.t ................... ok     
t/Align/AlignUtil.t .................... ok     
t/Align/Graphics.t ..................... skipped: The optional module GD (or dependencies thereof) was not installed
...
t/AlignIO/msf.t ........................ ok   
t/AlignIO/nexml.t ...................... skipped: The optional module Bio::Phylo (or dependencies thereof) was not installed
t/AlignIO/nexus.t ...................... ok     
...
t/Assembly/ContigSpectrum.t ............ ok       
t/Assembly/IO/bowtie.t ................. skipped: The optional module Bio::DB::Sam (or dependencies thereof) was not installed
t/Assembly/IO/sam.t .................... skipped: The optional module Bio::DB::Sam (or dependencies thereof) was not installed
t/Assembly/core.t ...................... ok       
t/Cluster/UniGene.t .................... ok   

Afterwards, I test with:

perl -e "use Bio::DB::Sam;"

and get:

Can't locate Bio/DB/Sam.pm in @INC (you may need to install the Bio::DB::Sam module) (@INC contains: /Library/Perl/5.18/darwin-thread-multi-2level /Library/Perl/5.18 /Network/Library/Perl/5.18/darwin-thread-multi-2level /Network/Library/Perl/5.18 /Library/Perl/Updates/5.18.2/darwin-thread-multi-2level /Library/Perl/Updates/5.18.2 /System/Library/Perl/5.18/darwin-thread-multi-2level /System/Library/Perl/5.18 /System/Library/Perl/Extras/5.18/darwin-thread-multi-2level /System/Library/Perl/Extras/5.18 .) at -e line 1.
BEGIN failed--compilation aborted at -e line 1.

I get the same results when cloning bioperl-live from GitHub (sync'd to revision 73c446c69a77) and trying to install that way.

Note that I have installed samtools 0.1.18 (to match the version on our cluster) by:

  • downloading the .tar.gz

  • running 'make'

  • copying 'samtools', 'bcftools/bcftools', and 'misc/*.pl' to ~/debarcer-packages/bin, which is on my path

Afterward, I get this:

$ which samtools
/Users/gvwilson/debarcer-packages/bin/samtools

This build did not produce a '.so' file, even though there is a rule in the samtools-0.1.18 Makefile that looks like it (maybe?) ought to produce one.

Greg Wilson
  • 1,015
  • 10
  • 24
  • 1
    `$ which perl` ... Also, don't mess with the system `perl`. Install one for yourself and use it. – Sinan Ünür Mar 18 '16 at 16:17
  • `which perl` reports `/usr/bin/perl` – Greg Wilson Mar 18 '16 at 18:44
  • 1
    You need to install [Bio-SamTools](https://metacpan.org/release/Bio-SamTools) ... but, as I said, you will be much better off if you build your own `perl` and use that instead of messing with the system Perl installation. – Sinan Ünür Mar 18 '16 at 22:51
  • 1
    Out of curiosity, why can't you use this on the cluster where the data is? Moving these huge (sometimes ~100 GB) files around and trying to do the work on your personal machine will be slow and cause you other headaches. The answer I provided will allow you to do all the work on the cluster (just one way). To answer your last question, ".so" is on Linux and the library file on your Mac is called "lilbbam.a" (you'll see that after compiling samtools). – SES Mar 24 '16 at 22:06

2 Answers2

2

The module Bio::DB::Sam provides bindings to an older version of samtools that did not rely htslib. This is an important point because you may run into issues using SAM/BAM files generated with samtools or other aligners because most tools use htslib these days.

For building the module, you are on the right track with the version you are using but it is difficult to build if you do not know the correct flags. I previously provided a solution to do this and I'll show a better way here (just use a package manager for the Perl module).

wget http://sourceforge.net/projects/samtools/files/samtools/0.1.18/samtools-0.1.18.tar.bz2
tar xjf samtools-0.1.18.tar.bz2 && cd samtools-0.1.18
make CFLAGS=-fPIC
export SAMTOOLS=`pwd`

The last command will allow you to install the Perl module without looking for the PATH to samtools and being prompted for it. Note, the extra CFLAGS argument may not be needed on your Mac, so try without it first. It is required on Linux, and since the module uses so much memory you will likely only be using this on a Linux machine. Now, install the Perl module.

cpanm Bio::DB::Sam

or cpan if you prefer. That should get you a working Bio::DB::Sam. I don't know what you are trying to do but I will mention that the fine folks over at EBI have developed bindings to htslib called Bio::DB::HTS based on Lincoln Stein's XS code in the Bio::DB::Sam module. This is really what you should be using because the version of SAMtools mentioned above is really old and not being developed. That is my opinion and a word of caution though, nothing wrong with Bio::DB::Sam.

edit:

You find it easier to manage Perl without using the "system" Perl, and here is one solution. Other people may have their preferred method, but perlbrew (combined with cpanminus) will make this type of work fun and much less of a pain (and they are popular choices). That would be my first step: set up perlbrew, install Perl 5.22, then install cpanminus. That might sound challenging but it is just a few commands. Something along the lines of:

curl -L http://install.perlbrew.pl | bash
source ~/perl5/perlbrew/etc/bashrc
perlbrew install perl-5.22.1
perlbrew switch perl-5.22.1
perlbrew install-cpanm

should do the trick. That will give you a kick-ass Perl with some nice features not available with your "system" Perl. This is a good idea because using /usr/bin/perl requires sudo, it involves messing with the system libraries which might cause an issue, and the recent Apple changes mean that working with root directories/libraries is completely unstable.

Community
  • 1
  • 1
SES
  • 850
  • 1
  • 9
  • 21
  • 1
    Why not just `cpan Bio::SamTools`? Is there something weird going on there? – brian d foy Mar 24 '16 at 06:29
  • @brian d foy That won't work because this this is an XS interface to samtools, so you need to compile that library with a specific flag and set the environment variable as shown above. – SES Mar 24 '16 at 16:34
  • Why all the downvotes? This works and I understand the libraries well. If someone thinks they can do it better or provide a different solution then please show it. If the downvotes are because I mentioned perlbrew, that was only a suggestion. Use whatever Perl you like. Downvoting a working solution, and the only answer, for reasons not related to the question seems petty to me. Please explain what you think the issue is here. – SES Mar 24 '16 at 16:45
  • @SinanÜnür with regards to the edit, the slash is there for a reason (see [this question](http://stackoverflow.com/questions/15691977/why-start-a-shell-command-with-a-backslash) for a description). Though, I guess I'll remove that part entirely. I didn't think mentioning perlbrew would be such a contentious issue, just trying to help. – SES Mar 24 '16 at 17:09
  • 1
    @SES: What doesn't work with `cpan`? You wouldn't use a Perl module installer to install samtools. You use it to install the Perl module. If there's something that's not working, I'd like to fix that. – brian d foy Mar 24 '16 at 17:39
  • 1
    I'm the author of the `cpan` tool. You didn't use it where I would expect someone to use it so I asked if there was something wrong with the tool. Curiously, there's an [Alien::SamTools](http://www.metacpan.org/module/Alien::SamTools) Perl package that `cpan` can use to install samtools-0.1.19. If you have problems with the `cpan` tool not doing what it should do for you, please let us know so we can fix it. Good luck, – brian d foy Mar 24 '16 at 18:20
  • @briandfoy You are correct, you can use `cpan` and vanilla `make` for samtools on a Mac. This won't work on Linux though, which is an important distinction because most work will be done on a Linux cluster. This is odd, usually it is not the case that compiling a tool and installing the module/interface is easier on Mac, usually it is the other way around. – SES Mar 24 '16 at 19:03
  • @SinanÜnür I'll update my answer to be more accurate, there are some important details that have been missed and it could be better. – SES Mar 24 '16 at 20:23
  • @briandfoy `cpan` for installing the Perl library works fine after the correct installation of the required C library and I updated my answer to reflect that. I thought you were asking why not *just* use `cpan` and skip the other steps. My response is that they are required (the C library part) beforehand, so `cpan` alone won't work. Hopefully that provides some clarity. – SES Mar 24 '16 at 21:03
  • @SinanÜnür I have to ask you to reconsider your comment about deleting the answer and me being confused. I'm not confused, I know these libraries and the Perl toolchain quite well. There was a misunderstanding is all. And, we discussed it and arrived at an understanding. – SES Mar 24 '16 at 21:16
  • 1
    Your answer was confusing, to say the very least, given the certainty with which you claimed `cpan` or `cpanm` would not work on Linux. I've deleted my comment, but I can't undo my vote. If a third person comes and votes to delete, it will be deleted. If that happens, you can flag it for moderator attention. – Sinan Ünür Mar 24 '16 at 22:40
  • @SinanÜnür Thank you for the response and I'll delete mine also. It should be clear now that I only meant the C library is required. In my defense, brian could have said "after installing the C library..." or "why not use `cpan` for the module installation part" and that would have been more clear what he meant. I think it was a misunderstanding on both parts, but I feel like it's been resolved. It would be ironic if my post is deleted since I've been working on this module :-/ – SES Mar 24 '16 at 22:54
2

You need to install an additional (optional) module to use samtools. That's what the The optional module Bio::DB::Sam message is about. You don't need it for the rest of BioPerl, so it's not a hard dependency.

For Bio::DB::Sam, you need samtools-0.1.17 (the latest version the module works with according to its docs). I downloaded the source and ran make. There were some warnings, but it appears to work. From your question, I don't think you had a problem here.

I then installed Bio::DB::Sam:

 $ cpan Bio::DB::Sam

There were some compiler warnings, but the module passed its tests and installed. The cpan command took care of dependencies too, so it also installed BioPerl for me.

If you need some environment variables, you can set them for a one-off run of the command:

$ CFLAGS=... SAMTOOLS=... cpan Bio::DB::Sam

Note that installing Bio::DB::Sam prompted me for the location of samtools. I pointed it at the build directory:

$ cpan5.22.0 Bio::DB::Sam
Running install for module 'Bio::DB::Sam'
Configuring L/LD/LDS/Bio-SamTools-1.43.tar.gz with Build.PL
This module requires samtools 0.1.10 or higher (samtools.sourceforge.net).
Please enter the location of the bam.h and compiled libbam.a files: /Users/brian/Downloads/samtools-0.1.17

I'm betting there's not something complicated like the answer SES gave. You just need an optional module. The README for Bio::DB::Sam notes some problems that people might have and offers so workarounds, but I didn't run into these problems and my setup is close to yours.

Note that Alien::SamTools is a Perl package that installs the non-Perl samtools, but it says it installs 0.1.19. Maybe that works too, but that's not what Bio::DB::Sam says on the tin.

HoldOffHunger
  • 18,769
  • 10
  • 104
  • 133
brian d foy
  • 129,424
  • 31
  • 207
  • 592
  • This won't work on Linux because `./Build` will fail. That is the reason for my answer, maybe I could explain that in detail. – SES Mar 24 '16 at 18:53
  • 1
    We're not on Linux. We're on Mac OS X, as stated in the question. – brian d foy Mar 24 '16 at 19:19
  • 1
    Out of curiosity, I tried building the module with `samtools 0.1.17` on my [ArchLinux install](https://www.nu42.com/2011/11/laptop-attachment-syndrome.html). Everything works like a charm using `cpanm` (which I prefer) ***and*** `cpan`. See [Build Bio::DB::Sam on ArchLinux.txt](https://gist.github.com/nanis/6dc81e67b80d833bf68e) – Sinan Ünür Mar 24 '16 at 20:06
  • Yes, OP is on a Mac. I was responding to your comment about why the extra steps in my answer since you mentioned me by name. These steps are required on RHEL/CentOS (used by every institute/university I've worked for) or the build will fail. It is worth mentioning again that this version of samtools and the interface are still usable but not in development. We'd probably be better to focus efforts on the latest versions but that is another issue altogether. – SES Mar 24 '16 at 20:10
  • @SinanÜnür after all the discussion, you are saying my answer works like a charm? ;-) – SES Mar 24 '16 at 20:15
  • 1
    No, Sinan is saying that my `cpan` tool does the job just fine on ArchLinux, which is the opposite of what you are saying. – brian d foy Mar 24 '16 at 20:18
  • brian, I was referring to the compiler flag, which was not part of your original answer. That is important, and it was joke of course, for fun – SES Mar 24 '16 at 20:21
  • 1
    @SES The compiler flag is part of building the C library `samtools`. If the library is built correctly, both `cpan` and `cpanm` will work. You told brian that one could not just use `cpan` or `cpanm`. Which is it? – Sinan Ünür Mar 24 '16 at 20:37
  • @SinanÜnür that's right. I think there is some confusion. I showed how to compile samtools and properly link Bio::DB::Sam. brian asked why not just install the module with `cpan` and I said it won't work because it relies on the C library to be built, which for most systems requires special flags to build properly (the module). After you compile the C library you can use `cpan` or `cpanm` and I updated my answer to reflect that. The C library is not distributed with the module so you can't just say `cpan Bio::DB::Sam`. Is that clear now? – SES Mar 24 '16 at 20:43
  • 1
    @SES ... sooo, what was the point of claiming `cpan` or `cpanm` wouldn't work? If a specific compiler flag is needed with gcc (not necessarily only Linux), then come out and say it instead of claiming x or y does not work. BTW, note that BioPerl does not pass tests with `perl` 5.23.9 ... But, `Bio::DB::Sam` builds and installs fine using `cpan` and `cpanm` on CentOS 7 as well. – Sinan Ünür Mar 24 '16 at 21:01
  • @SinanÜnür I answered this below. My point was `cpan` alone won't work, the C library is required and hopefully this is clear now. brian asked me why not just use `cpan` and suggested in his answer that I'm making it more complicated. I tried to explain why not just `cpan` but it was misunderstood somehow. I know about the BioPerl test failures and I've discussed this on github. I want to fix those but like a lot of people, it's hard to spend time on an unpaid project. – SES Mar 24 '16 at 21:10
  • 1
    You can install samtools with `cpan` using Alien::SamTools, as I noted. – brian d foy Mar 24 '16 at 21:41