1

I would like to compute the distances between atoms from PDB files. How can I do this calculation for PDB files ?

ATOM      1  N   GLY A  23     -10.507   5.621  25.325  1.00 60.45           N  
ATOM      2  CA  GLY A  23      -9.475   4.636  25.745  1.00 56.55           C
ATOM      3  C   GLY A  23      -8.714   4.045  24.571  1.00 58.66           C
ATOM      4  O   GLY A  23      -8.526   2.829  24.498  1.00 60.74           O 
ATOM      5  N   GLN A  24      -8.275   4.899  23.651  1.00 52.00           N 
ATOM      6  CA  GLN A  24      -7.532   4.446  22.482  1.00 45.40           C 
ATOM      7  C   GLN A  24      -6.089   4.139  22.865  1.00 39.62           C  
ATOM      8  O   GLN A  24      -5.617   4.536  23.928  1.00 35.50           O  
ATOM     14  N   ARG A  25      -5.391   3.428  21.991  1.00 37.97           N 
ATOM     15  CA  ARG A  25      -4.003   3.065  22.237  1.00 37.23           C
ATOM     16  C   ARG A  25      -3.133   4.276  22.555  1.00 36.13           C 
ATOM     17  O   ARG A  25      -2.441   4.293  23.571  1.00 31.46           O 
  • column2 - atom number
  • column3 - atom name
  • column4 - residue name
  • column5 - chain id
  • column6 - residue number
  • column7 - X coordinate
  • column8 - Y coordinate
  • column9 - Z coordinate

distance = sqrt((x1-x2)^2+(y1-y2)^2+(z1-z2)^2)

  • 2
    Can you rephrase your question for people without a bioinformatics background? It is not at all obvious what you are trying to achieve ("I need to do these calculations based on those columns, thus the output should be X" instead of "I need to calculate distances between atoms and in case of a single chain [...]"). We don't magically know how to interpret your input file. – Adrian Frühwirth May 11 '13 at 10:54
  • 1
    search for "distance between atoms" in this forum and just pick the answer you like. – Ed Morton May 11 '13 at 14:41
  • This thread may be of help to you: http://stackoverflow.com/questions/13645439/calculating-the-distance-between-atomic-coordinates – David Cain May 11 '13 at 16:13

1 Answers1

3

You should refrain from parsing the PDB files yourself. PDB files have a lot of irregularities that tools like awk aren't well suited for. Instead, you should parse the structure into a meaningful object using an already implemented parser.

I like Biopython. You should look into the tutorial for more on how to interact with structures, but this is a really basic way to get distance between two atoms. Note that the - operator is overridden to return atom distance (no need to deal with coordinates or the distance formula!).

from Bio import PDB

parser = PDB.PDBParser()

# Parse the PDB file into a meaningful structure object
pdb_path = "/path/to/files/1abc.pdb"
pdb_id = "1abc"
struct = parser.get_struct(pdb_id, pdb_path)

# Get two atoms to compare by navigating the SMRCA structure
chain_a = struct[0]["A"]
res1 = chain_a[26]
res2 = chain_a[23]
atom1 = res1["C"] 
atom2 = res2["C"]

print "Distance: %d" % (atom1 - atom2)

It's slightly unclear to me which atoms you wish to calculate distance for, but you can look at the resname field of a PDB.Residue object (e.g. res1) if you want to compare based on residue name.

You may also want to look Bio.PDB.NeighborSearch to look for nearby atoms (it's an implementation of a k-d tree).

David Cain
  • 16,484
  • 14
  • 65
  • 75
  • 1
    I'm glad it's easy to understand. I won't write your code for you, but I can recommend performing a `NeighborSearch` on each residue where the cutoff is 4 Angstroms, then getting distances. It should be fairly straightforward. If you read the tutorial, you should have all the necessary tools to accomplish your task. – David Cain May 12 '13 at 05:48
  • wow, NeighborSearch is already built in? nifty. http://biopython.org/DIST/docs/api/Bio.PDB.NeighborSearch'.NeighborSearch-class.html – flies May 16 '13 at 21:21