Questions tagged [genbank]

GenBank format (GenBank Flat File Format) consists of an annotation section and a sequence section. The start of the annotation section is marked by a line beginning with the word "LOCUS".

GenBank format (GenBank Flat File Format) consists of an annotation section and a sequence section. The start of the annotation section is marked by a line beginning with the word LOCUS.

53 questions
6
votes
3 answers

Convert FASTA to GenBank

Is there a way to use BioPython to convert FASTA files to a Genbank format? There are many answers on how to convert from Genbank to FASTA, but not the other way around.
Ricky Su
  • 295
  • 1
  • 7
  • 10
4
votes
0 answers

How to match string pattern in R

I'm looking for a good library to extract information of a genbank (gbk) file using R. this is a common structure of a gbk file gene complement(1..1002) /gene="bla" /locus_tag="VV1_RS00005" …
abraham
  • 661
  • 8
  • 14
2
votes
3 answers

Incomplete parsing of entire genbank file using python/biopython

The main goal of my script is to convert a genbank file to a gtf file. My problem pertains to extracting CDS information (gene, position (e.g., CDS 2598105..2598404), codon_start, protein_id, db_xref) from all CDS entries. My script should…
cer
  • 1,961
  • 2
  • 17
  • 26
2
votes
3 answers

Modify location of a genbank feature

Edit : I know feature.type will give gene/CDS and feature.qualifiers will then give "db_xref"/"locus_tag"/"inference" etc. Is there a feature. object which will allow me to access the location (eg: [5240:7267](+) ) directly? This URL give a bit…
PyKa
  • 113
  • 1
  • 7
2
votes
1 answer

Genbank query (package seqinr): searching in sequence description

I am using the function query() of package seqinr to download myoglobin DNA sequences from Genbank. E.g.: query("myoglobins","K=myoglobin AND SP=Turdus merula") Unfortunately, for a lot of the species I'm looking for I don't get any sequence at all…
Lachouette
  • 21
  • 3
1
vote
2 answers

Create set from python loop output and remove duplicates

I've researched the options already available on stack overflow, but none have helped given my lack of understanding of python. I have the following code, which gives the below output. I would like to understand how to remove the duplicate lines…
Susheel Busi
  • 163
  • 8
1
vote
0 answers

Searching for matching sequences in two gb files

I have two Genbank files which I am extracting the genes doing the following: genes_1 = [] for feature in sequence.features: if feature.type=='gene': genes_1.append(feature) That is working just fine, I am able to obtain the sequence,…
Harr1ls
  • 71
  • 5
1
vote
0 answers

Add feature seauence in genbank file with biopython

I'm new to python and biopython, so please bear with me if i ask sth really stupid or absurd ;P So I'm working on a group projet of school, i have been asked to write a genbank file which must contains : for each contigs : name, circular or not,…
mewu3
  • 11
  • 1
1
vote
1 answer

BioPython: How to Parse by "Locus" key in GenBank

I have a Genbank file containing a number of sequences. I have a second text file that contains the names of these sequences, as well as some other information about them, in a TSV, which I read in as a pandas dataframe. I used the .sample function…
1
vote
1 answer

How to download _full_ RefSeq record using Efetch?

I have a problem downloading a full record from Nucleotide db. I use: from Bio import Entrez from Bio import SeqIO with Entrez.efetch(db="nuccore", rettype="gb", retmode="full", id="NC_007384") as handle: seq_record = SeqIO.read(handle, "gb")…
Some student
  • 131
  • 2
  • 13
1
vote
1 answer

Iterating through a series of GenBank genes and appending each gene's features to a list returns only the last gene

I'm having a problem with my code. I'm trying to iterate through the genbank file's list of genes using BioPython. Here's what it looks like: class genBank: gbProtId = str() gbStart = int() gbStop = int() gbStrand =…
CelineDion
  • 906
  • 5
  • 21
1
vote
1 answer

Biopython Genbank.Record : trying to understand source code

I am writing a csv reader to generate Genbank files to capture annotations with sequence. First I used a Bio.SeqRecord and got correctly formatted output but the SeqRecord class lacks fields that I need. Blockquote FEATURES …
JoeT
  • 13
  • 1
  • 4
1
vote
0 answers

Scraping through pages in multi-paged results of Genbank

Example: http://www.ncbi.nlm.nih.gov/nuccore/?term=trocholejeunea In which there're 79 items in 4 pages, however, when I went through pages by clicking "Previous" or "Next", the address turns into http://www.ncbi.nlm.nih.gov/nuccore/ It doesn't…
passiflora
  • 332
  • 1
  • 9
1
vote
1 answer

Download only part of genbank file with biopython

I am new to Biopython and I have a performance issue when parsing genbank files. I have to parse a lot of gb files, from which I have the accession numbers. After parsing, I only want to examine the taxonomy and the organelle of the file. Right now,…
1
vote
1 answer

How do I edit AND SAVE the sequence of a genbank file to a NEW genbank file using biopython?

I have a .gbk file that's wrong, and I have the list of corrections that follows the format of "Address of Nuclotide: correct nucleotide" 1:T 2:C 4:A 63:A 324:G etc... I know how to open and parse exact original sequence…
Tom
  • 919
  • 3
  • 9
  • 22
1
2 3 4