I'm in a bit unusual situation. There are seven different proteins stored in a single file according to their residues names. Each protein has different sequence length. Now I need to calculate the center of mass of each protein and generate a time series data.I know how to do with a single protein, but do not with multiple protein system. For single protein I can do something like this:
import MDAnalysis as mda
import numpy as np
u = mda.Universe('lp400start.gro')
u1 = mda.Merge(u.select_atoms("not resname W and not resname WF and not resname ION"))
u1.load_new('lp400.xtc')
protein = u1.select_atoms("protein")
arr = np.empty((protein.n_residues, u1.trajectory.n_frames, 3))
for ts in u.trajectory:
arr[:, ts.frame] = protein.center_of_mass(compound='residues')
the data files are publicly available here. The residue sequence in the topology file can be checked using grep "^ *1[A-Z]" -B1 lp400final.gro | grep -v "^ *1[A-Z]"
, the output:
38ALA BB 74 52.901 33.911 6.318
--
38ALA BB 148 41.842 29.381 7.211
--
137GLY BB 455 36.756 4.287 3.284
--
137GLY BB 762 44.721 60.377 3.112
--
252HIS SC3 1368 28.682 37.936 6.727
--
252HIS SC3 1974 18.533 46.506 6.314
--
576PHE SC3 3263 48.937 38.538 4.013
--
576PHE SC3 4552 18.513 25.948 3.800
--
1092PRO SC1 6470 42.510 40.992 6.775
--
1092PRO SC1 8388 14.709 4.759 6.370
--
1016LEU SC110524 57.264 56.308 2.632
--
1016LEU SC112660 50.716 14.698 2.728
--
1285LYS SC215345 0.793 33.529 1.509
First protein has sequence length of 38 residues and it has a copy of its own and then the second protein and so on. Now I want to have the COM of each protein at each time frame and build it into a time series. In addition to proteins topology file also contains the DPPC particles as well. Could someone help me how to do this?
Thanks!
To make sure the output trajectory is correct it looks something like this enter link description here