1

So I have strings that form concatenated 1's and 0's with length 12. Here are some examples:

100010011100
001111110000
001010100011

I want to isolate sections of each which start with 1, following with any numbers of zeros, and then ends with 1.

So for the first string, I would want ['10001','1001']

The second string, I would want nothing returned

The third list, I would want ['101','101','10001']

I've tried using a combination of positive lookahead and positive lookbehind, but it isn't working. This is what I've come up with so far [(?<=1)0][0(?=1)]

pault
  • 41,343
  • 15
  • 107
  • 149
crayfishcray
  • 379
  • 4
  • 15
  • Easy to do with regex or pyparsing. – Raphael Nov 08 '19 at 21:37
  • @pault, I'm having a really hard time getting the regex to "go back" and account for parts of the string already included in a previous slice. – crayfishcray Nov 08 '19 at 21:41
  • @raphael I will look into pyparsing – crayfishcray Nov 08 '19 at 21:42
  • @Code-Apprentice, for example, the string '10101', I would want it to split into ['101','101']. The second '1' would be used twice to form the last character of the first string and the first character of the second string in the list. However, when I used re.search() and re.findall() it only gives me ['101'] – crayfishcray Nov 08 '19 at 21:50
  • You should probably tag this as "algorithm" – Quentin Nov 08 '19 at 21:52
  • @crayfishcray Yah, I keep rethinking how I would solve the problem and post a comment before double checking the documentation. My answer below provides a rough sketch of how I would do what you want. – Code-Apprentice Nov 08 '19 at 21:58
  • 2
    Possible duplicate of [Python regex find all overlapping matches?](https://stackoverflow.com/questions/5616822/python-regex-find-all-overlapping-matches). From the accepted answer, the following should work for you: `re.findall("(?=(10+1))", myString)` – pault Nov 08 '19 at 21:59
  • @BowlOfRed: even if it isn't the most straight forward method, I like your approach. (in which only the problematic 1 is in the lookahead and added after to the match result). – Casimir et Hippolyte Nov 08 '19 at 22:17

3 Answers3

2

For a non-regex approach, you can split the string on 1. The matches you want are any elements in the resulting list with a 0 in it, excluding the first and last elements of the array.

Code:

myStrings = [
    "100010011100",
    "001111110000",
    "001010100011"
]

for s in myStrings:
    matches = ["1"+z+"1" for i, z in enumerate(s.split("1")[:-1]) if (i>0) and ("0" in z)]
    print(matches)

Output:

#['10001', '1001']
#[]
#['101', '101', '10001']
pault
  • 41,343
  • 15
  • 107
  • 149
0

I suggest writing a simple regex: r'10+1'. Then use python logic to find each match using re.search(). After each match, start the next search at the position after the beginning of the match.

Code-Apprentice
  • 81,660
  • 23
  • 145
  • 268
0

Can't do it in one search with a regex.

def parse(s):
    pattern = re.compile(r'(10+1)')
    match = pattern.search(s)
    while match:
        yield match[0]
        match = pattern.search(s, match.end()-1)
RootTwo
  • 4,288
  • 1
  • 11
  • 15