MDAnalysis: Trajectories saved from Readers of the same Universe are incorrect

Question

I have three trajectory replicas (xtc) of a membrane protein in a simulated physiological environment (water, ions, membrane...) in a MDAnalysis' (2.2.0) Universe. I want to save other three additional xtcs that contain only the trajectory of the protein (of the atoms of the protein), one per each of the original xtc trajectories. When I try to iterate through each of the three MDAnalysis' Readers contained in the Universe, the first saved trajectory seems to be correct, but the other two have the same coordinates in all the frames. The starting, complete trajectories are correct. If my starting point is necessarily a Universe with the three Readers, how do I do this correctly and efficiently?

Code:

import MDAnalysis as mda
u = mda.Universe("11159_dyn_117.pdb", "11156_trj_117.xtc", "11157_trj_117.xtc", "11158_trj_117.xtc")
protein = u.select_atoms("protein")
protein.write("protein.pdb")

for num, reader in enumerate(u.trajectory.readers, 1):
    with mda.Writer(f"{num}.xtc", protein.n_atoms) as w:
        for ts in reader.trajectory:
            w.write(protein.atoms)

# Then check the generated individual trajectories by loading them in 
# Universes and checking the positions array. I checked them in PyMOL.

Files downloadable at: https://submission.gpcrmd.org/dynadb/dynamics/id/117/ (model file and trajectory files)

I don’t think that we envisaged that people would use the ChainReader (the code that concatenates trajectories on the fly) in this manner and I’m not surprised that the iteration breaks. Either create three separate universes or use the start and stop frames to write three different trajectories. If you think this is a bug please file an issue in the MDAnalysis issue tracker https://GitHub.com/MDAnalysis/mdanalysis/issues — orbeckst, Jul 14 '22 at 08:59
@orbeckst Yes I wouldn't mind using the start and stop frames, because they can be easily accessed from the individual Readers themselves (n_frames). However, can I get some pointers as to what function or module should I use or where I should pass the start and stop? I didn't seem to be able to find it in the Writers docs/to make it work when I thought about this possibility. — Francho Nerín Fonz, Jul 15 '22 at 09:10

score 1 · Accepted Answer · answered Jul 19 '22 at 21:52

You can write a trajectory directly from an AtomGroup with the AtomGroup.write(name, frames=trajectory_iterator) method. Access the start/stop frames in the chained trajectory with the private ChainReader._start_frames attribute (not documented).

import MDAnalysis as mda

# example data
from MDAnalysisTests import datafiles as data

# create a chained trajectory and select some atoms
u = mda.Universe(data.PSF, [data.DCD, data.DCD])
protein = u.select_atoms("protein")

# get start/stop frames: 
# array([  0,  98, 196]) for this example
sf = u.trajectory._start_frames

# write each subtrajectory of the chained trajectory
# to a new file in a different format (only containing
# the atoms of the selected AtomGroup)
for i, (start, stop) in enumerate(zip(sf[:-1], sf[1:])):
    protein.atoms.write(f"protein_{i}.xtc", frames=u.trajectory[start:stop])

This will produce trajectories protein_0.xtc and protein_1.xtc. If you want to load them, don't forget to create a file that contains a minimal topology for the selection

protein.write("protein.gro")

so that you can load the new trajectories with

p1 = mda.Universe("protein.gro", "protein_1.xtc")
p2 = mda.Universe("protein.gro", "protein_2.xtc")

Notes

I am using the start/stop frames from ChainReader._start_frames instead of the raw lengths of trajectories because they are updated appropriately if the ChainReader detects overlapping frames when using continuous=True.
Instead of using frames=u.trajectory[start:stop] one can also use frames=slice(start, stop).
Instead of writing protein.atoms.write() one normally writes protein.write(), which is equivalent and shorter, but I wanted to make clear that it's always correct to go to the atoms, in particular if one wanted to write a whole universe, in which case one would use u.atoms.write().
Instead of using the convenience method AtomGroup.write() one could also use explicit trajectory writing where you open a trajectory for writing with
```
with mda.Writer("protein_1.xtc", protein.n_atoms) as W:
    for ts in u.trajectory[start:stop]:
       W.write(protein)
```
which provides more control over every step but is more verbose.

MDAnalysis: Trajectories saved from Readers of the same Universe are incorrect

1 Answers1