Biopython : how to extract only relevant atom and save a pdb file (not locally)?

Question

Using Biopython. I have a list of atoms. rep_atoms = [CA, CB, CD3] (Carbon atoms). I want to save only these from any given PDB file. I don't want to save it locally; I want it to save in the memory (Lots of iteration). I have arrived at the code below, but it saves the file locally and is very slow. So, my goal is from each atom in PDB, if it is present in rep_atoms. Make a new_pdb store only that information so that when I call it later in my code, it should be a PDB file without getting saved in my computer in a local folder.

How do I append each atom? Printing all atoms is very fast. I want to append it, but it wouldn't be a PDB structure file. What should I do?


    from Bio.PDB import .... PDBIO, Select ....

    class rep_atom_Select(Select):
        def accept_atom(self, atom):
            if atom.get_name() in rep_atoms:
                return 1
            else:
                return 0
    
    def rep_atoms_pdb(input_pdb):
        io = PDBIO()
        io.set_structure(input_pdb)
        for model in input_pdb:
            for chain in model:
                for residue in chain:
                    for atom in residue:
                        if atom.get_name() in rep_atoms:
                            print(atom)
    #                        dnr_only = io.save("dnr_only.pdb", rep_atom_Select())

Saving the file after the loop seems like the simple and obvious solution. — tripleee, Dec 08 '21 at 11:51
Judging from the indentation it seems that you were saving the pdb file in the inner for-loop. Did you check the performance when io.save is _not_ in the loop as well? — Lydia van Dyke, Dec 08 '21 at 11:51
@LydiavanDyke Even if the performance is good, it will take a lot of space in my computer. (I need 1000s of versions of a pdb to work with for any given pdb.) — AnythingButThis, Dec 08 '21 at 13:01
One solution I can think of is I extract the atom information and then append in a file. Like Atom, Atom_SeqNo, Res, Res_ID and all. But I think that will break somewhere with the indentation and all. — AnythingButThis, Dec 08 '21 at 13:06
@tripleee I am sorry but I am unable to understand what do you mean. Lydia van Dyke has also said something similar. I can not place io.save elsewhere, gives me indentation error. — AnythingButThis, Dec 08 '21 at 13:11
Why do you now need 1000s of versions? Neither your code nor your problem description indicates that. You described that you want to filter out 3 atom types from a pdb. That creates 1 file, not 1000. — Lydia van Dyke, Dec 08 '21 at 13:16
"I don't want to save it locally; I want it to save in the memory (Lots of iteration)." The CA, CB CD3 serves as example. For each residue I have to pick few atoms (depending on the residue) so that I use only those atom to structurally align the new file with the help of a software. I hope it clears. — AnythingButThis, Dec 08 '21 at 13:18
Your question demands that we are familiar with the file formats and libraries you use, but the basic problem seems to be one which any Python programmer could solve if you made it easier to see where exactly you are stuck. I have posted a temporary answer to suggest in more detail what we all seem to be saying, but it's probably not directly acceptable. — tripleee, Dec 08 '21 at 13:21
Thank you for your reply and answers @tripleee . I am sorry. I am very new to stackoverflow question asking thing, and I am not a native English speaker. I know there should not be any excuse but I have tried my best to balance the question. Also tagged it with biopython the module. And have tried to say that I don't want to save it. I want it like like a .pdb file but internally. There is no append method sadly. — AnythingButThis, Dec 08 '21 at 13:30
It's not clear what you mean by "I want it like a `.pdb` file but internally." If you want the atoms as a Python list, create a list and append the ones you want to it. I have updated my answer to demonstrate this. — tripleee, Dec 08 '21 at 13:31
@tripleee https://stackoverflow.com/questions/32634559/how-to-generate-a-file-without-saving-it-to-disk-in-python I am trying to do this but with a pdb file. Might be a very naive question. Once again sorry about it. — AnythingButThis, Dec 08 '21 at 13:37
That creates an object which contains the actual bytes which _would_ be written to disk, but it's unclear whether that's what you actually want. Usually a much better solution is to not flatten the data into an opaque pile of bytes and instead keep a representation in memory which you can actually query and manipulate. As such, this is basically an [XY Problem](https://en.wikipedia.org/wiki/XY_problem). What are you actually trying to achieve? — tripleee, Dec 08 '21 at 13:41
Thank you for replying. I understand that, I hoped that after each iteration (where new file/data generation happens), It will clear the memory in next iteration. I think I understand what I am supposed to do. I will save the file with same name for all iteration so that it gets overwritten. Thank you so much for your patience. @tripleee — AnythingButThis, Dec 08 '21 at 13:48

tripleee · Answer 1 · 2021-12-08T13:52:08.227

Save after the loop, once, instead of thousands of times inside the loop.

def rep_atoms_pdb(input_pdb):
    my_atoms = list()
    for model in input_pdb:
        for chain in model:
            for residue in chain:
                for atom in residue:
                    if atom.get_name() in rep_atoms: # or if rep_atom_Select().accept_atom(atom):
                        my_atoms.append(atom) # or something like this
    # The function returns the list of extracted atoms
    return my_atoms

Your definition of rep_atom_Select() does not seem to be directly compatible with this design, nor am I sure receiving the atoms as a list is actually what you want, but this should at least give you a nudge in the right direction.

Brief reading of the Bio.PDB.PDBIO documentation suggests that you might simply want to return the actual PDBIO object. I think something like this:

class rep_atom_Select(Select):
    def accept_atom(self, atom):
        if atom.get_name() in rep_atoms:
            return 1
        else:
            return 0

def rep_atoms_pdb(input_pdb):
    io = rep_atom_Select()
    io.set_structure(input_pdb)
    return io

This is based on a very cursory reading of the documentation, but at least demonstrates how you would use your overridden class to select only some of the atoms in the input_pdb structure.

Thanks! It was indeed an [XY problem](https://en.wikipedia.org/wiki/XY_problem). — AnythingButThis, Dec 08 '21 at 13:50
Maybe see updated answer now. I have no way to test this but hopefully it should at least help you untangle the concepts here. I have no idea whether `PDBIO` will immediately apply the filter in your subclass but that's how I would guess it works. — tripleee, Dec 08 '21 at 13:53

Biopython : how to extract only relevant atom and save a pdb file (not locally)?

1 Answers1