1

I want to convert multiple FASTA format files (DNA sequences) to the NEXUS format using BIO.SeqIO module but I get this error:

Traceback (most recent call last):
  File "fasta2nexus.py", line 28, in <module>
    print(process(fullpath))
  File "fasta2nexus.py", line 23, in process
    alphabet=IUPAC.ambiguous_dna)
  File "/Library/Python/2.7/site-packages/Bio/SeqIO/__init__.py", line 1003, in convert
    with as_handle(in_file, in_mode) as in_handle:
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/contextlib.py", line 17, in __enter__
    return self.gen.next()
  File "/Library/Python/2.7/site-packages/Bio/File.py", line 88, in as_handle
    with open(handleish, mode, **kwargs) as fp:
IOError: [Errno 2] No such file or directory: 'c'

What am I missing?

Here is my code:

##!/usr/bin/env python

from __future__ import print_function # or just use Python 3!

import fileinput
import os
import re
import sys

from Bio import SeqIO, Nexus
from Bio.Alphabet import IUPAC


test = "/Users/teton/Desktop/test"

files = os.listdir(os.curdir)

def process(filename):
    # retuns ("basename", "extension"), so [0] picks "basename"
    base = os.path.splitext(filename)[0] 
    return SeqIO.convert(filename, "fasta", 
                         base + ".nex", "nexus", 
                         alphabet=IUPAC.ambiguous_dna)

for files in os.listdir(test):
    for file in files:
        fullpath = os.path.join(file)
        print(process(fullpath))
MattDMo
  • 100,794
  • 21
  • 241
  • 231
Ramon
  • 99
  • 1
  • 7

2 Answers2

3

This code should solve the majority of problems I can see.

from __future__ import print_function # or just use Python 3!

import fileinput
import os
import re
import sys

from Bio import SeqIO, Nexus
from Bio.Alphabet import IUPAC

test = "/Users/teton/Desktop"

def process(filename):
    # retuns ("basename", "extension"), so [0] picks "basename"
    base = os.path.splitext(filename)[0] 
    return SeqIO.convert(filename, "fasta", 
                         base + ".nex", "nexus", 
                         alphabet=IUPAC.ambiguous_dna)
    
for root, dirs, files in os.walk(test):
    for file in files:
        fullpath = os.path.join(root, file)
        print(process(fullpath))

I changed a few things. First, I ordered your imports (personal thing) and made sure to import IUPAC from Bio.Alphabet so you can actually assign the correct alphabet to your sequences. Next, in your process() function, I added a line to split the extension off the filename, then used the full filename for the first argument, and just the base (without the extension) for naming the Nexus output file. Speaking of which, I assume you'll be using the Nexus module in later code? If not, you should remove it from the imports.

I wasn't sure what the point of the last snippet was, so I didn't include it. In it, though, you appear to be walking the file tree and process()ing each file again, then referencing some undefined variable named count. Instead, just run process() once, and do whatever count refers to within that loop.

You may want to consider adding some logic to your for loop to test that the file returned by os.path.join() actually is a FASTA file. Otherwise, if any other file type is in one of the directories you search and you process() it, all sorts of weird things could happen.

EDIT

OK, based on your new code I have a few suggestions. First, the line

files = os.listdir(os.curdir)

is completely unnecessary, as below the definition of the process() function, you're redefining the files variable. Additionally, the above line would fail, as you are not calling os.curdir(), you are just passing its reference to os.listdir().

The code at the bottom should simply be this:

for file in os.listdir(test):
    print(process(file))

for file in files is redundant, and calling os.path.join() with a single argument does nothing.

Community
  • 1
  • 1
MattDMo
  • 100,794
  • 21
  • 241
  • 231
  • Thank you! Can you change the code for only current directory. I got this error: File "/Library/Python/2.7/site-packages/Bio/AlignIO/__init__.py", line 214, in write count = writer_class(fp).write_file(alignments) File "/Library/Python/2.7/site-packages/Bio/AlignIO/NexusIO.py", line 98, in write_file self.write_alignment(first_alignment) File "/Library/Python/2.7/site-packages/Bio/AlignIO/NexusIO.py", line 105, in write_alignment raise ValueError("Must have at least one sequence") ValueError: Must have at least one sequence – Ramon Jul 24 '16 at 01:40
  • @Ramon what do you mean, only the current directory? Just put whatever directory you want to scan in `test`. – MattDMo Jul 24 '16 at 01:42
  • I did, but it keep saying: ValueError: Must have at least one sequence. I modified the code: test = "/Users/teton/Desktop/bucky" where there are hundreds of files in the bucky folder. – Ramon Jul 24 '16 at 01:45
  • Look at your `.fa` files. Do they have at least one sequence in them? I just ran the `SeqIO.convert()` code on a FASTA file I happened to have lying around with a single DNA sequence in it, and it returned a `.nex` file just fine, printing `1` (I'm guessing the number of sequences?). – MattDMo Jul 24 '16 at 01:52
  • please get some of .fa files from here, and see: https://db.tt/RS7ZJcwP – Ramon Jul 24 '16 at 01:54
  • @Ramon [here](https://dl.dropboxusercontent.com/u/72133618/clusters_with_nexus.zip) are the results, please check them to make sure they look OK. – MattDMo Jul 24 '16 at 02:01
  • My code ran without problem. I didn't do the `os.walk` business, I just used `os.listdir()` to create a list of files in the specified directory, then processed them. – MattDMo Jul 24 '16 at 02:03
  • @Ramon what do you get when you open a Python interpreter and run `import Bio; print(Bio.__version__)`? Also, what do you get when you run `import sys; print(sys.version)`? – MattDMo Jul 24 '16 at 02:03
  • Yes, MattDMo, its corret. They are in NEXUS format. Thank you so much. I updated my code based on your code given for only current directory but I got the error I put in my question. import Bio; print(Bio.__version__) : 1.67 – Ramon Jul 24 '16 at 02:07
  • Can you provide your latest code in you answer, without os.walk. – Ramon Jul 24 '16 at 02:14
  • Voila! I have them now! Thank you so much! – Ramon Jul 24 '16 at 02:19
0
  1. NameError

You imported SeqIO but are calling seqIO.convert(). Python is case-sensitive. The line should read:

return SeqIO.convert(filename + '.fa', "fasta", filename + '.nex', "nexus", alphabet=IUPAC.ambiguous_dna)
  1. IOError: for files in os.walk(test):

IOError is raised when a file cannot be opened. It often arises because the filename and/ or file path provided does not exist.

os.walk(test) iterates through all subdirectories in the path test. During each iteration, files will be a list of 3 elements. The first element is the path of the directory, the second element is a list of subdirectories in that path, and the third element is a list of files in that path. You should be passing a filename to process(), but you are passing a list in process(files).

You have implemented it correctly in this block for root, dirs, files in os.walk(test):. I suggest you implement it similarly in the for loop below.

  1. You are adding .fa to your filename. Don't add .fa.
ilyas patanam
  • 5,116
  • 2
  • 29
  • 33
  • Thanks. Now I got this error! return SeqIO.convert(filename + '.fa', "fasta", filename + '.nex', "nexus", alphabet=IUPAC.ambiguous_dna) File "/Library/Python/2.7/site-packages/Bio/SeqIO/__init__.py", line 1003, in convert with as_handle(in_file, in_mode) as in_handle: File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/contextlib.py", line 17, in __enter__ return self.gen.next() File "/Library/Python/2.7/site-packages/Bio/File.py", line 88, in as_handle with open(handleish, mode, **kwargs) as fp: IOError: [Errno 2] No such file or directory – Ramon Jul 24 '16 at 00:08
  • I have edited my answer to address your comment. I suggest you edit your question to include the info you provided in the comment, so my answer makes sense for others who come upon this in the future. – ilyas patanam Jul 24 '16 at 00:53
  • Thank you. I will. In fact I dont need for subdirectories, just current directory is fine. now I get this error saying :No such file or directory: '/Users/teton/Desktop/bucky/cluster1.fa.fa' why it assume an extra .fa? cluster1.fa exist. – Ramon Jul 24 '16 at 01:04
  • If you don't need to recurse through subdirectories. Look at this [answer](http://stackoverflow.com/questions/11968976/list-files-in-only-the-current-directory). You're getting that error because, you've done `filename + '.fa'`. Don't add the `.fa`. – ilyas patanam Jul 24 '16 at 01:08
  • @Ramon because your parameters include `filename + ".fa"`. Either pass `cluster1` instead, or adjust your naming convention. – MattDMo Jul 24 '16 at 01:08
  • @Ramon Are you sure the path `/Users/teton/Desktop/bucky/cluster1.fa` exists? Do you have permissions to read the file? – ilyas patanam Jul 24 '16 at 01:11
  • This is the path: "/Users/teton/Desktop/bucky". cluster*.fa is the file. – Ramon Jul 24 '16 at 01:13
  • Do some debugging. Try process("Users/teton/Desktop/bucky/cluster1.fa") outside of the `for` loop and see if it works. If that doesn't open try to just open the file in python by using `with open()`, and see if you can read it. – ilyas patanam Jul 24 '16 at 01:16
  • Ok, I'm going to update both the code and the error, this time for files in a directory only. – Ramon Jul 24 '16 at 01:31