0

I have specific pattern at the beginning of each line. I want to delete that particular pattern not the complete line in python. My data looks like after retrieving from the actual file

>homo_seg-Val-abc-1-1
>homo_seg-Beg-cdf-2-1
>homo_seg-Try-gfh-3-2
>homo_seg-Fuss-cdh-3-1

Here I want to delete the ">homo_seg-" from the dataset and retains only following

Val-abc-1-1
Beg-cdf-2-1
Try-gfh-3-2
Fuss-cdh-3-1

I can do this in perl

$new =~s/homo_seg-//g;

My code is:

import sys
inFile = sys.argv[1]
with open(inFile) as fasta:
    for line in fasta:
        if line.startswith('>'):
            header = line.split()
            t = header[0]

        import re  # from below answer

        regex = r">homo_seg-"

        subst = ""

        result = re.sub(regex, subst, t, 0, re.MULTILINE)
        print(result)

This code just giving output of last line. I know its some minor error but not able to pick it up.

Azat Ibrakov
  • 9,998
  • 9
  • 38
  • 50

2 Answers2

0

Try this:

new_line = old_line[9:]

or if you want to be extra safe:

if old_line.startswith('homo_seg-') :
    new_line = old_line[9:]
lenik
  • 23,228
  • 4
  • 34
  • 43
0

You can check on https://regex101.com/r/hvFquS/1

 import re

 regex = r"homo_seg-"

 test_str = ("homo_seg-Val-abc-1-1\n"
    "homo_seg-Beg-cdf-2-1\n"
    "homo_seg-Try-gfh-3-2\n"
    "homo_seg-Fuss-cdh-3-1")

 subst = ""

 result = re.sub(regex, subst, test_str, 0, re.MULTILINE)

 if result:
     print (result)
tanaydin
  • 5,171
  • 28
  • 45
  • I have updated my question. Above code is working fine with if the data is provided in above test_str but as I provide retrieved data, I don't get results. – Kritika Rajain Jul 25 '18 at 05:28
  • then you can edit regexp pattern as you need so. – tanaydin Jul 25 '18 at 08:07
  • I managed to get the required output via using replace. And here my issue was not with the regexp pattern instead my output displays only last line from file. Because of any silly mistake might be. – Kritika Rajain Jul 25 '18 at 11:48