Find substrings matching a pattern allowing overlaps

Question

So I have strings that form concatenated 1's and 0's with length 12. Here are some examples:

100010011100
001111110000
001010100011

I want to isolate sections of each which start with 1, following with any numbers of zeros, and then ends with 1.

So for the first string, I would want ['10001','1001']

The second string, I would want nothing returned

The third list, I would want ['101','101','10001']

I've tried using a combination of positive lookahead and positive lookbehind, but it isn't working. This is what I've come up with so far [(?<=1)0][0(?=1)]

@pault, I'm having a really hard time getting the regex to "go back" and account for parts of the string already included in a previous slice. — crayfishcray, Nov 08 '19 at 21:41
@Code-Apprentice, for example, the string '10101', I would want it to split into ['101','101']. The second '1' would be used twice to form the last character of the first string and the first character of the second string in the list. However, when I used re.search() and re.findall() it only gives me ['101'] — crayfishcray, Nov 08 '19 at 21:50
@crayfishcray Yah, I keep rethinking how I would solve the problem and post a comment before double checking the documentation. My answer below provides a rough sketch of how I would do what you want. — Code-Apprentice, Nov 08 '19 at 21:58
Possible duplicate of [Python regex find all overlapping matches?](https://stackoverflow.com/questions/5616822/python-regex-find-all-overlapping-matches). From the accepted answer, the following should work for you: `re.findall("(?=(10+1))", myString)` — pault, Nov 08 '19 at 21:59
@BowlOfRed: even if it isn't the most straight forward method, I like your approach. (in which only the problematic 1 is in the lookahead and added after to the match result). — Casimir et Hippolyte, Nov 08 '19 at 22:17

score 2 · Accepted Answer · answered Nov 08 '19 at 22:14

For a non-regex approach, you can split the string on 1. The matches you want are any elements in the resulting list with a 0 in it, excluding the first and last elements of the array.

Code:

myStrings = [
    "100010011100",
    "001111110000",
    "001010100011"
]

for s in myStrings:
    matches = ["1"+z+"1" for i, z in enumerate(s.split("1")[:-1]) if (i>0) and ("0" in z)]
    print(matches)

Output:

#['10001', '1001']
#[]
#['101', '101', '10001']

Code-Apprentice · Answer 2 · 2019-11-08T21:59:32.127

0

I suggest writing a simple regex: r'10+1'. Then use python logic to find each match using re.search(). After each match, start the next search at the position after the beginning of the match.

edited Nov 08 '19 at 21:59

answered Nov 08 '19 at 21:52

Code-Apprentice

81,660
23
145
268

and where is the code to do that? – Casimir et Hippolyte Nov 08 '19 at 22:22

score 0 · Answer 3 · answered Nov 08 '19 at 22:12

0

Can't do it in one search with a regex.

def parse(s):
    pattern = re.compile(r'(10+1)')
    match = pattern.search(s)
    while match:
        yield match[0]
        match = pattern.search(s, match.end()-1)

answered Nov 08 '19 at 22:12

RootTwo

4,288
1
11
15

Find substrings matching a pattern allowing overlaps

3 Answers3