How can I remove or do something with text between two pattern

Question

I would like to delete text lines that do not meet a multiple condition. This is and example:

">"nxp:NX_A0A075B6H9-1 \DbUniqueId=NX_A0A075B6H9-1 \PName=Immunoglobulin lambda variable 4-69 isoform Iso 1 \GName=IGLV4-69 \NcbiTaxId=9606 \TaxName=Homo Sapiens \Length=119 \SV=1 \EV=19 \PE=1 \ModRes=(42||Disulfide) MAWTPLLFLTLLLHCTGSLSQLVLTQSPSASASLGASVKLTCTLSSGHSSYAIAWHQQQP EKGPRYLMKLNSDGSHSKGDGIPDRFSGSSSGAERYLTISSLQSEDEADYYCQTWGTGI

">"nxp:NX_A0A075B6I0-1 \DbUniqueId=NX_A0A075B6I0-1 \PName=Immunoglobulin lambda variable 8-61 isoform Iso 1 \GName=IGLV8-61 \NcbiTaxId=9606 \TaxName=Homo Sapiens \Length=122 \SV=7 \EV=27 \PE=2 \ModRes=(46||Disulfide) MSVPTMAWMMLLLGLLAYGSGVDSQTVVTQEPSFSVSPGGTVTLTCGLSSGSVSTSYYPS WYQQTPGQAPRTLIYSTNTRSSGVPDRFSGSILGNKAALTITGAQADDESDYYCVLYMGS GI

">"nxp:NX_A0A075B6I1-1 \DbUniqueId=NX_A0A075B6I1-1 \PName=Immunoglobulin lambda variable 4-60 isoform Iso 1 \GName=IGLV4-60 \NcbiTaxId=9606 \TaxName=Homo Sapiens \Length=120 \SV=1 \EV=20 \PE=1 \ModRes=(43||Disulfide) MAWTPLLLLFPLLLHCTGSLSQPVLTQSSSASASLGSSVKLTCTLSSGHSSYIIAWHQQQ PGKAPRYLMKLEGSGSYNKGSGVPDRFSGSSSGADRYLTISNLQFEDEADYYCETWDSNT

I only want the lines that meet the condition of PE =2, PE=5 or PE=4

I try to do this using this code:

    list= []
for line in open("nextprot_all.fasta","r"):
    if line.startswith(">") and "PE=2" or "PE=4" or "PE=5" in line: 
            list.append(line)

with open('test_1.txt', 'w') as output:
    for i in list:
        output.write(i)

The problem is that with this code I just get in the new file the first line and not the rest of the text.

Is there any way to catch the text between two ">" when the condition is True?

The result that I'd like to have is this:

">"nxp:NX_A0A075B6I0-1 \DbUniqueId=NX_A0A075B6I0-1 \PName=Immunoglobulin lambda variable 8-61 isoform Iso 1 \GName=IGLV8-61 \NcbiTaxId=9606 \TaxName=Homo Sapiens \Length=122 \SV=7 \EV=27 \PE=2 \ModRes=(46||Disulfide) MSVPTMAWMMLLLGLLAYGSGVDSQTVVTQEPSFSVSPGGTVTLTCGLSSGSVSTSYYPS WYQQTPGQAPRTLIYSTNTRSSGVPDRFSGSILGNKAALTITGAQADDESDYYCVLYMGS GI

Thank you in advance

#

Program fixed. My question wasn't about conditional. My question was not about conditionals but about how I could iterate the following lines.

list= []
First=False
with open("nextprot_all.peff", 'r') as infile:
    for line in infile:
        if line.startswith(">"):
            if line.find("\PE=2") !=-1 or line.find("\PE=3") !=-1 or line.find("\PE=5") !=-1:
                First=True
            else:
                First=False

        if First:
            list.append(line)

with open('test_2.txt', 'w') as output:
    for i in list:
        output.write(i)

Read the duplication question closely and check your if statement on line 4 again. — ikkuh, Oct 31 '17 at 13:49
It is, just in a different context. `if "PE=2" or "PE=4" or "PE=5" in line:` does not do what you think it does. — roganjosh, Oct 31 '17 at 13:50
@roganjosh No, it is not. When I use this script works perfectly. The problem is that I want the following lines of text until the next ">" too. — Enrique, Oct 31 '17 at 13:53
Your `if` is _always_ `True` because of the issue raised in the duplicate. Is this what you intend? If so, why have the check at all? — roganjosh, Oct 31 '17 at 13:55
@roganjosh no, it's not always true, for example when PE=1 or PE=3. Those are the pieces of text that I do not want in my new file. — Enrique, Oct 31 '17 at 13:59
What are we arguing here? It _is_ always true because _it does not work how you think it does_. You wrote broken code. You have the option of reading the duplicate and understanding the issue or arguing with me here. One of them will advance your project, the other will persist your problem. Up to you. — roganjosh, Oct 31 '17 at 14:01
Thank you @roganjosh. I have already made the program work. As you can see, it was not a problem about conditionals. — Enrique, Nov 02 '17 at 13:41

How can I remove or do something with text between two pattern

0 Answers0