0

I got a list of strings. Those strings have all the two markers in. I would love to extract the string between those two markers for each string in that list.

example:

markers 'XXX' and 'YYY' --> therefore i want to extract 78665786 and 6866 

['XXX78665786YYYjajk', 'XXX6866YYYz6767'....]
Subbu VidyaSekar
  • 2,503
  • 3
  • 21
  • 39

4 Answers4

2

You can just loop over your list and grab the substring. You can do something like:

import re

my_list = ['XXX78665786YYYjajk', 'XXX6866YYYz6767']
output = []
for item in my_list:
    output.append(re.search('XXX(.*)YYY', item).group(1))

print(output)

Output:

['78665786', '6866']
Codesidian
  • 310
  • 2
  • 12
  • i got following --> 'NoneType' object has no attribute 'group' – derpaminontas_1992 Jul 27 '20 at 10:29
  • 1
    @derpaminontas_1992, it means that there's no match of pattern in string, so it returned `None`. – Olvin Roght Jul 27 '20 at 10:31
  • @derpaminontas_1992 Could you comment what you've written? As long as your expression and your strings contain the same pattern, it should find your substring. – Codesidian Jul 27 '20 at 10:51
  • i found the mistake. my list after the following code contains two strings that dont have forw_primer and rev_primer in fasta_files: match=[p for p in fasta_file if forw_primer and rev_primer in p] the problem is that i dont know why i got those two more sequences that dont have the marker in. – derpaminontas_1992 Jul 27 '20 at 11:54
  • Use try and except. Make sure to log exactly what wasn't as expected and check that with your dataset and wherever you're getting that data from. – Codesidian Jul 28 '20 at 08:09
0
import re
l = ['XXX78665786YYYjajk', 'XXX6866YYYz6767'....]
l = [re.search(r'XXX(.*)YYY', i).group(1) for i in l]

This should work

  • i got that for my example here --> AttributeError: 'NoneType' object has no attribute 'group' – derpaminontas_1992 Jul 27 '20 at 10:37
  • Can you paste here the list you are operating with. This error is result of absence of given pattern, i.e. 'XXX{some string}YYY'. – Himanshu Jagtap Jul 27 '20 at 10:58
  • i found the mistake. my list after the following code contains two strings that dont have forw_primer and rev_primer in fasta_files: match=[p for p in fasta_file if forw_primer and rev_primer in p] the problem is that i dont know why i got those two more sequences that dont have the marker in – derpaminontas_1992 Jul 27 '20 at 12:18
0

Another solution would be:

import re
test_string=['XXX78665786YYYjajk','XXX78665783336YYYjajk']
int_val=[int(re.search(r'\d+', x).group()) for x in test_string]
mpx
  • 3,081
  • 2
  • 26
  • 56
-1

the command split() splits a String into different parts.

list1 = ['XXX78665786YYYjajk', 'XXX6866YYYz6767']
list2 = []

for i in list1:
    d = i.split("XXX")
    for g in d:
        d = g.split("YYY")
        list2.append(d)

print(list2)

it's saved into a list

P.Rauser
  • 1
  • 1