2

I am trying to isolate certain pieces of information within a dsn6 file. Specifically, I want to find the electron density around two atoms within a protein. I want to then run a python code to determine if there is positive electron density between the two of them. In order to do so, I believe I need to convert the dsn6 file, which is stored in a very compressed format, into a format I can understand.

I have searched around found ways to view dsn6 files (such as through PyMOL or COOT in CCP4) as a model. However, I do not know a way to view this decompressed file as text. I have tried, but have been completely unsuccessful in trying to figure out how these programs decompress/read the files.

This is what a piece of a dsn6 file currently looks like:

˘>˝‰‡>™†≠>I14>)B=*À¢<™ÅK=i§=‰≤Ñ=ê8ºO∑ææÑ¸ΩfiZ
Ω2…=wÇ=œjÖ=ÿm=◊6=â(:ÿ◊ôΩl≥æ=sËΩ,vzºËm
>dg>ÄàG>>hº&>è,>SªÂ=

I hope that the decompressed dsn6 file will look similar to what a pdb file looks like. For example,

ATOM    228  OG1 THR A  58       7.843  45.672  59.760  1.00 15.56   O        
ATOM    229  CG2 THR A  58       6.672  44.132  61.223  1.00 14.90   C        
ATOM    230  N   PRO A  59       9.292  42.373  62.754  1.00 13.16   N         
ATOM    231  CA  PRO A  59       9.409  40.982  63.160  1.00 13.07   C        
Ben
  • 121
  • 8
  • 2
    You can't view DSN6 file as a model, because it's not a model, it's volumetric data (electron density). For details of the format see https://web.archive.org/web/20170721175122/http://www.uoxray.uoregon.edu/tnt/manual/node104.html Converting it to text won't help you much. If you downloaded this file from RCSB, you can also download the same data in CCP4 format which is more widely supported. – marcin Jul 11 '19 at 12:09
  • Then is there a way to, instead of converting it to text, isolate a specific area and determine if it is a positive electron density (this would specifically apply to the fo-fc map)? – Ben Jul 11 '19 at 13:42
  • I suppose you can check (interpolated) map value at a specified position using cctbx. Maybe also using PyMOL. Or you can code it yourself. – marcin Jul 11 '19 at 19:53

1 Answers1

1

From the link provided by @marcin you can have an insight on the data format, I've extracted the header but for reshaping the array I've got the wrong number of values:

import struct
import numpy as np
file_path = r'5x22_2fofc.dsn6'
with open(file_path,'rb') as f:
    brick = f.read(512)

header = brick # the firs brick is our header
n = 2 # number of bytes per entry
entries = [header[i:i + n] for i in range(0, len(header), n)]
header_desc = [
        'x start', # 1
        'y start', # 2
        'z start', # 3
        'x extent', # 4
        'y extent', # 5
        'z extent', # 6
        'x sampling rate', # 7
        'y sampling rate', # 8
        'z sampling rate', # 9
        'Header(18) * A Cell Edge', # 10
        'Header(18) * B Cell Edge', # 11
        'Header(18) * C Cell Edge', # 12
        'Header(18) * alfa', # 13
        'Header(18) * beta', # 14
        'Header(18) * gamma', # 15
        'Header(19) (253 -3) /(rmax -rmin)', # 16
        '(3rmax - 253rmin)/(rmax -rmin)]', # 17
        'Cell Constant Scaling Factor', # 18
        '100'] # 19

header_conv = [struct.unpack('>h',i)[0] for i in entries]
# we now extract the data afte the header(using an offset)
data = np.memmap(file_path,dtype='uint8',offset=512,mode='r')
for i in zip(header_desc,header_conv):
    print(i)

This is the header I've extracted:

('x start', -20)
('y start', -56)
('z start', -135)
('x extent', 126)
('y extent', 118)
('z extent', 272)
('x sampling rate', 170)
('y sampling rate', 96)
('z sampling rate', 266)
('Header(18) * A Cell Edge', 14912)
('Header(18) * B Cell Edge', 8345)
('Header(18) * C Cell Edge', 23783)
('Header(18) * alfa', 7200)
('Header(18) * beta', 7868)
('Header(18) * gamma', 7200)
('Header(19) (253 -3) /(rmax -rmin)', 1375)
('(3rmax - 253rmin)/(rmax -rmin)]', 80)
('Cell Constant Scaling Factor', 80)
('100', 100)
G M
  • 20,759
  • 10
  • 81
  • 84