0

Given a string as an example below:
string = 'a bcde:Title - 1 xyz;dummy-a bcde:Title - 2.1 xyz;dummy-a bcde:Title - 3.1 xyz;dummy-' My interesting content is between 'a bcde:' and ' xyz' , so in this case I would like to get these strings (Title - 1,Title - 2.1,Title - 3.1) out and create a list.

# following is the code
string = 'a bcde:Title - 1 xyz;dummy-a bcde:Title - 2.1 xyz;dummy-a bcde:Title - 3.1 xyz;dummy-'
start = 'a bcde:'
end = ' xyz'
n = [1,2,3]
title_list = []
for index in n:
    title = (string.split(start))[index].split(end)[0]
    title_list.append(title)
print(title_list)

With the current code, it works as expected, because the string is short enough, I could define occurrence (n = [1,2,3]). While the string is too big to count then I start to have a problem.I am looking for ways that are more efficient and explicitly. I expect to create a string list containing any info between start & end patterns as shown below: ['Title - 1', 'Title - 2.1', 'Title - 3.1',....]

Thanks !

Ken
  • 3
  • 1

1 Answers1

1

have a look at regex; see e.g. here. you could do

import re

string = 'a bcde:Title - 1 xyz;dummy-a bcde:Title - 2.1 xyz;dummy-a bcde:Title - 3.1 xyz;dummy-'

print(re.findall(r'a bcde:(.*?) xyz', string))
# ['Title - 1', 'Title - 2.1', 'Title - 3.1']

or a bit more versatile as a function:

def match_between(s, p0, p1):
    expr = re.compile(p0 + r'(.*?)' + p1)
    return re.findall(expr, string)

patterns = (r'a bcde:', r' xyz')
print(match_between(string, *patterns))
# ['Title - 1', 'Title - 2.1', 'Title - 3.1']
FObersteiner
  • 22,500
  • 8
  • 42
  • 72