How to cut out matched string

Question

I want to cut out string matched.

I consider using "[m.start() for m in re.finditer('')]" for get index.

But I think exist better way than this.

For example, I want to cut out string between "header" and "footer".

str = "header1svdijfooter1ccsdheader2cdijhfooter2"
headers = ["one": "header1", "two": "header2"]
footers = ["one": "footer1", "two": "footer2"]

#I want to get ["header1svdijfooter1", "header2cdijhfooter2"]

Please advice me.

please edit your post to fix the `headers` and `footers`. declaring them this way is a compile error http://pastebin.com/pPNHvJEe Did you mean to create a dictionary? use `dict()` or `{}` — Justice Fist, Mar 20 '14 at 05:50

Adam Smith · Accepted Answer · 2014-03-20T05:57:53.943

1

import re

def returnmatches(text,headers,footers):
    """headers is a list of headers
footers is a list of footers
text is the text to search"""
    for header,footer in zip(headers,footers):
        pattern = r"{}\w+?{}".format(header,footer)
        try:
            yield re.search(pattern,input_text).group()
        except AttributeError:
            # handle no match
            pass

Or alternatively:

text = "header1svdijfooter1ccsdheader2cdijhfooter2"
headers = ["header1", "header2"]
footers = ["footer1", "footer2"]

import re

matches = [re.search(r"{}\w+?{}".format(header,footer),text).group() for header,footer in zip(headers,footers) if re.search(r"{}\w+?{}".format(header,footer),text)]

edited Mar 20 '14 at 05:57

answered Mar 20 '14 at 05:49

Adam Smith

52,157
12
73
112

There's no mention of whitespace being special in the match? – demented hedgehog Mar 20 '14 at 06:00
@dementedhedgehog his example doesn't have any whitespace in the string so I didn't want to include anything he didn't explicitly mention in his example. – Adam Smith Mar 20 '14 at 06:02

demented hedgehog · Answer 2 · 2014-03-20T05:56:32.897

import re

# as a general rule you shouldn't call variables str in python as it's a builtin function name.
str = "header1svdijfooter1ccsdheader2cdijhfooter2" 

# this is how you declare dicts.. but if you're only going to have "one"
# and "two" for the keys why not use a list?  (you need the {} for dicts).
#headers = {"one": "header1", "two": "header2"}  
#footers = {"one": "footer1", "two": "footer2"}  
delimiters = [("header1", "footer1"), ("header2", "footer2")]

results = []
for header, footer in delimiters:

    regex = re.compile("({header}.*?{footer})".format(header = header, footer = footer))

    matches = regex.search(str)
    if matches is not None:
        for group in matches.groups():
            results.append(group)

print results

this is a good use of regular expression. would you consider editing your answer to contain a formatted string, and not use the `%` format operator? http://stackoverflow.com/questions/5082452/python-string-formatting-vs-format — Justice Fist, Mar 20 '14 at 05:53

John1024 · Answer 3 · 2014-03-20T06:32:34.637

The calculation can be done in one line using a list comprehension:

s = "header1svdijfooter1ccsdheader2cdijhfooter2"
headers = {"one": "header1", "two": "header2"}
footers = {"one": "footer1", "two": "footer2"}
out = [re.search('({}.*?{})'.format(headers[k], footers[k]), s).group(0) for k in sorted(headers.keys())]

The above assumes, as per the example, that there is one and only one matching group.

Alternatively, if one prefers looping:

s = "header1svdijfooter1ccsdheader2cdijhfooter2"
headers = {"one": "header1", "two": "header2"}
footers = {"one": "footer1", "two": "footer2"}
out=[]
for k in sorted(headers.keys()):
    out.extend(re.search('({}.*?{})'.format(headers[k], footers[k]), s).groups())
print out

The above produces the output:

['header1svdijfooter1', 'header2cdijhfooter2']

score 0 · Answer 4 · answered Mar 20 '14 at 06:02

Without re:

str = "header1svdijfooter1ccsdheader2cdijhfooter2"
result = []
capture=False
currentCapture = ""
for i in range(len(str)):
    if str[i:].startswith("header1") or str[i:].startswith("header2"):
        currentCapture = ""
        capture=True
    elif str[:i].endswith("footer1") or str[:i].endswith("footer2"):
        capture=False
        result.append(currentCapture)
        currentCapture = ""
    if capture:
        currentCapture = currentCapture+str[i]
if currentCapture:
    result.append(currentCapture)

print result

Output:

['header1svdijfooter1', 'header2cdijhfooter2']

How to cut out matched string

4 Answers4