I would like to ask you all a question about superimposing and calculating the RMSD of multiple mmCIF files at once. I am creating a code that downloads a entire homologous superfamily, which then need to be trimmed down based on a specific RMSD value. I want to automate this process in python (within jupyterLab).
The mmCIF files in question contain different proteins. For now I have tried to use BIO.PDB (MMCIFPParser) to first parse the structure from the first .cif file (called mmcif_ref), and then a list of all other files. I want to compare all other protein structures with the reference and calculate a RMSD. However, the problem is that they don't have the same atoms, which I found on the internet, is one of the main criteria.
My current code doesn't work and gives an error:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[24], line 1
----> 1 rmsd = calculate_rmsd(mmcif_list, mmcif_ref, mmcif_comp)
Cell In[21], line 7, in calculate_rmsd(mmcif_dir, mmcif_ref, mmcif_comp)
4 parser = MMCIFParser()
6 # Parse the structures from the MMCIF files
----> 7 structure1 = parser.get_structure("reference", mmcif_dir + '/' + mmcif_ref)
8 structure2 = parser.get_structure("comparison", mmcif_dir + '/' + mmcif_comp)
10 # Select the atoms for superimposition
TypeError: can only concatenate list (not "str") to list
So my question is, seeing my code, what would you advice me to change in order to be able to superimpose multiple different proteins on one reference protein and save only the cif files that meet a specific rmsd value.
I hope someone can help me. Thanks in advance!
# Initialization
cur_dir = os.getcwd()
mmcif_dir = cur_dir + '/' + protein_name + '/input/cif_files'
output_dir = cur_dir + '/' + protein_name + '/prep'
mmcif_list = []
for file in os.listdir(mmcif_dir):
if file.endswith('.cif'):
mmcif_list.append(file)
mmcif_ref = mmcif_list[0]
mmcif_comp = mmcif_list[1:]
print(mmcif_ref)
print(mmcif_comp)
def calculate_rmsd(mmcif_dir, mmcif_ref, mmcif_comp):
parser = MMCIFParser()
# Parse the structures from the MMCIF files
structure1 = parser.get_structure("reference", mmcif_dir + '/' + mmcif_ref)
structure2 = parser.get_structure("comparison", mmcif_dir + '/' + mmcif_comp)
# Select the atoms for superimposition
atoms1 = Selection.unfold_entities(structure1, "N, CA, C")
atoms2 = Selection.unfold_entities(structure2, "N, CA, C")
# Create an instance of the Superimposer
super_imposer = Superimposer()
# Set the atoms for superimposition
super_imposer.set_atoms(atoms1, atoms2)
# Apply the transformation to the atoms of structure2
super_imposer.apply(structure2.get_atoms())
# Calculate the RMSD
rmsd = super_imposer.rms
return rmsd
rmsd = calculate_rmsd(mmcif_list, mmcif_ref, mmcif_comp)