Questions tagged [genbank]

GenBank format (GenBank Flat File Format) consists of an annotation section and a sequence section. The start of the annotation section is marked by a line beginning with the word "LOCUS".

GenBank format (GenBank Flat File Format) consists of an annotation section and a sequence section. The start of the annotation section is marked by a line beginning with the word LOCUS.

53 questions

votes

3 answers

Convert FASTA to GenBank

Is there a way to use BioPython to convert FASTA files to a Genbank format? There are many answers on how to convert from Genbank to FASTA, but not the other way around.

biopython fasta genbank

asked May 12 '15 at 03:59

Ricky Su

votes

0 answers

How to match string pattern in R

I'm looking for a good library to extract information of a genbank (gbk) file using R. this is a common structure of a gbk file gene complement(1..1002) /gene="bla" /locus_tag="VV1_RS00005" …

r bioinformatics fasta genbank

asked Nov 24 '21 at 05:50

abraham

votes

3 answers

Incomplete parsing of entire genbank file using python/biopython

The main goal of my script is to convert a genbank file to a gtf file. My problem pertains to extracting CDS information (gene, position (e.g., CDS 2598105..2598404), codon_start, protein_id, db_xref) from all CDS entries. My script should…

python biopython genbank

asked Dec 17 '15 at 17:19

cer

1,961
2
17
26

votes

3 answers

Modify location of a genbank feature

Edit : I know feature.type will give gene/CDS and feature.qualifiers will then give "db_xref"/"locus_tag"/"inference" etc. Is there a feature. object which will allow me to access the location (eg: [5240:7267](+) ) directly? This URL give a bit…

python biopython genbank

asked Jul 08 '14 at 16:04

PyKa

votes

1 answer

Genbank query (package seqinr): searching in sequence description

I am using the function query() of package seqinr to download myoglobin DNA sequences from Genbank. E.g.: query("myoglobins","K=myoglobin AND SP=Turdus merula") Unfortunately, for a lot of the species I'm looking for I don't get any sequence at all…

r genbank

asked Jul 15 '13 at 16:14

Lachouette

vote

2 answers

Create set from python loop output and remove duplicates

I've researched the options already available on stack overflow, but none have helped given my lack of understanding of python. I have the following code, which gives the below output. I would like to understand how to remove the duplicate lines…

python loops set genbank

asked Oct 03 '22 at 06:55

Susheel Busi

vote

0 answers

Searching for matching sequences in two gb files

I have two Genbank files which I am extracting the genes doing the following: genes_1 = [] for feature in sequence.features: if feature.type=='gene': genes_1.append(feature) That is working just fine, I am able to obtain the sequence,…

biopython genbank

asked Jun 20 '21 at 15:14

Harr1ls

vote

0 answers

Add feature seauence in genbank file with biopython

I'm new to python and biopython, so please bear with me if i ask sth really stupid or absurd ;P So I'm working on a group projet of school, i have been asked to write a genbank file which must contains : for each contigs : name, circular or not,…

python-3.x biopython fasta genbank

asked Feb 27 '20 at 22:23

mewu3

vote

1 answer

BioPython: How to Parse by "Locus" key in GenBank

I have a Genbank file containing a number of sequences. I have a second text file that contains the names of these sequences, as well as some other information about them, in a TSV, which I read in as a pandas dataframe. I used the .sample function…

python pandas bioinformatics biopython genbank

asked Oct 31 '19 at 02:02

alaskabiologist

vote

1 answer

How to download _full_ RefSeq record using Efetch?

I have a problem downloading a full record from Nucleotide db. I use: from Bio import Entrez from Bio import SeqIO with Entrez.efetch(db="nuccore", rettype="gb", retmode="full", id="NC_007384") as handle: seq_record = SeqIO.read(handle, "gb")…

biopython ncbi genbank

asked Mar 20 '19 at 16:46

Some student

vote

1 answer

Iterating through a series of GenBank genes and appending each gene's features to a list returns only the last gene

I'm having a problem with my code. I'm trying to iterate through the genbank file's list of genes using BioPython. Here's what it looks like: class genBank: gbProtId = str() gbStart = int() gbStop = int() gbStrand =…

python list bioinformatics biopython genbank

asked Mar 12 '19 at 18:10

CelineDion

vote

1 answer

Biopython Genbank.Record : trying to understand source code

I am writing a csv reader to generate Genbank files to capture annotations with sequence. First I used a Bio.SeqRecord and got correctly formatted output but the SeqRecord class lacks fields that I need. Blockquote FEATURES …

biopython genbank

asked Feb 20 '19 at 22:11

JoeT

vote

0 answers

Scraping through pages in multi-paged results of Genbank

Example: http://www.ncbi.nlm.nih.gov/nuccore/?term=trocholejeunea In which there're 79 items in 4 pages, however, when I went through pages by clicking "Previous" or "Next", the address turns into http://www.ncbi.nlm.nih.gov/nuccore/ It doesn't…

java html web-crawler genbank

asked Nov 12 '17 at 18:26

passiflora

vote

1 answer

Download only part of genbank file with biopython

I am new to Biopython and I have a performance issue when parsing genbank files. I have to parse a lot of gb files, from which I have the accession numbers. After parsing, I only want to examine the taxonomy and the organelle of the file. Right now,…

python parsing biopython genbank

asked Jul 27 '16 at 13:13

VictorBello

vote

1 answer

How do I edit AND SAVE the sequence of a genbank file to a NEW genbank file using biopython?

I have a .gbk file that's wrong, and I have the list of corrections that follows the format of "Address of Nuclotide: correct nucleotide" 1:T 2:C 4:A 63:A 324:G etc... I know how to open and parse exact original sequence…

python biopython genbank

asked Apr 07 '16 at 01:43

Tom

2 3 4 Next