Questions tagged [protein-database]

A file containing protein sequences together with corresponding metadata

Classical protein-databases are text files containing a large number of protein-sequences.

Protein sequences are represented as strings of uppercase letters, each corresponding to a different aminoacid. Each protein sequence is preceeded by a header line containing metadata (protein reference number, name, description...).

The standard fasta format looks like:

>P31946|1433B_HUMAN 14-3-3 protein beta/alpha OS=Homo sapiens GN=YWHAB PE=1 SV=3
MTMDKSELVQKAKLAEQAERYDDMAAAMKAVTEQGHELSNEERNLLSVAYKNVVGARRSS
YEILNSPEKACSLAKTAFDEAIAELDTLNEESYKDSTLIMQLLRDNLTLWTSENQGDEGD
AGEGEN
>P62258|1433E_HUMAN 14-3-3 protein epsilon OS=Homo sapiens GN=YWHAE PE=1 SV=1
MDDREDLVYQAKLAEQAERYDEMVESMKKVAGMDVELTVEERNLLSVAYKNVIGARRASW
YYKMKGDYHRYLAEFATGNDRKEAAENSLVAYKAASDIAMTELPPTHPIRLGLALNFSVF
YYEILNSPDRACRLAKAAFDDAIAELDTLSEESYKDSTLIMQLLRDNLTLWTSDMQGDGE
EQNKEALQDVEDENQ
>.........................................................

A great amount of work in Bioinformatics relates with storing (annotating), searching and analyzing the sequences in these databases.

145 questions
12
votes
4 answers

Cut within a pattern using Python regex

Objective: I am trying to perform a cut in Python RegEx where split doesn't quite do what I want. I need to cut within a pattern, but between characters. What I am looking for: I need to recognize the pattern below in a string, and split the string…
Michael Molter
  • 1,296
  • 2
  • 14
  • 37
11
votes
3 answers

Use grep to find either of two strings without changing the order of the lines?

I'm sure this has been asked but I can't find it so my apologies for redundancy. I want to use grep or egrep to find every line that has either ' P ' or ' CA ' in them and pipe them to a new file. I can easily do it with one or the other…
Steven C. Howell
  • 16,902
  • 15
  • 72
  • 97
5
votes
3 answers

Protein structure visualization

I've been asked to work on Protein structure visualization, something like RasMol where a user will be opening a pdb file to get the protein structure. How I can generate protein structure from the pdb file? I would like to code in Python and to…
Shane
  • 51
  • 1
  • 2
5
votes
1 answer

Deleteing residue from PDB using Biopython library

Using biopython library, I want to remove the residues that are listed in list as follows. This thread (http://pelican.rsvs.ulaval.ca/mediawiki/index.php/Manipulating_PDB_files_using_BioPython) provides an example to remove residue. I have following…
Exchhattu
  • 197
  • 3
  • 15
4
votes
3 answers

Extract all substrings in string

I want to extract all substrings that begin with M and are terminated by a * The string below as an example; vec<-c("SHVANSGYMGMTPRLGLESLLE*A*MIRVASQ") Would ideally return; MGMTPRLGLESLLE MTPRLGLESLLE I have tried the code below; regmatches(vec,…
Nosey
  • 714
  • 7
  • 14
4
votes
2 answers

amino acid binding site finding, protein database

I am trying to find whether the two atoms that belong to two different chains would be considered as 'bound' or not. This based on the fact that if the distance (euclidian, which could be find through the given x,y,z coordinates) is shorter than the…
aleatha
  • 93
  • 2
  • 8
4
votes
2 answers

How to implement SIFT (Scale-invariant feature transform) for 3D image in Python?

I saw many examples of SIFT for 2-dimensional image only: http://docs.opencv.org/3.1.0/da/df5/tutorial_py_sift_intro.html. But in Wikipedia there is written that SIFT may be applied for "3D modelling" as well. Please help me to find examples for…
LeonK
  • 51
  • 1
  • 3
3
votes
2 answers

How to find similarity percentage for multiple alligned sequence

My question is related to protein sequence alignment. When I use ClustalW for alignmnet I can see the identity percentage, strongly similar and weekly similar. But I want to find similarity percentage of all aligned sequence not Identity. I goggled…
3
votes
1 answer

How to save each ligand from a PDB file separately with Bio.PDB?

I have a list of PDB files. I want to extract the ligands of all the files (so, heteroatoms) and save each one separately into PDB files, by using the Bio.PDB module from BioPython. I tried some solutions, like this one: Remove heteroatoms from PDB…
MathB
  • 49
  • 9
3
votes
1 answer

DNA to RNA and Getting Proteins with Perl

I am working on a project(I have to implement it in Perl but I am not good at it) that reads DNA and finds its RNA. Divide that RNA's into triplets to get the equivalent protein name of it. I will explain the steps: 1) Transcribe the following DNA…
kamaci
  • 72,915
  • 69
  • 228
  • 366
3
votes
0 answers

How to unfold only protein atoms using Bio.PDB.Selection?

from Bio.PDB import PDBParser from Bio.PDB import Selection structure = PDBParser().get_structure('4GBX', '4GBX.pdb') # load your molecule atom_list = Selection.unfold_entities(structure[0]['E'], 'A') # 'A' is for Atoms in the chain 'E' When I…
3
votes
2 answers

How to separately get the X, Y or Z coordinates from a pdb file

I have a PDB file '1abz' (https://files.rcsb.org/view/1ABZ.pdb), which is containing the coordinates of a protein structure. Please ignore the lines of the header remarks, the interesting information starts at line 276 which says 'MODEL 1'. I would…
Cave
  • 201
  • 1
  • 4
  • 14
3
votes
1 answer

Glitch in Pandas? Cannot overwrite value

So I tried running a code I had developed previously, which has run numerous times nicely using pandas. My dataframe has a custom index (with unique string values as the index, representing a unique identifier, in this case, individual proteins),…
Alex Huszagh
  • 13,272
  • 3
  • 39
  • 67
3
votes
1 answer

Biopython: How to avoid particular amino acid sequences from a protein so as to plot Ramachandran plot?

I have written a python script to plot the 'Ramachandran Plot' of Ubiquitin protein. I am using biopython. I am working with pdb files. My script is as below : import Bio.PDB import numpy as np import matplotlib as mpl import matplotlib.pyplot as…
dexterdev
  • 537
  • 4
  • 22
3
votes
2 answers

How do I output a .pdb file using python script?

I'm currently in the process of manipulating a .pdb (protein data bank) file in python. My end goal is to turn the python script back into a pdb file so that I can run simulations in either VMD or PyMol.Can someone please help?
Alicia Burns
  • 31
  • 1
  • 2
1
2 3
9 10