0

I want to find the index of multiple occurrences of at least two zeros followed by at least two ones (e.g., '0011','00011', '000111' and so on), from a string (called 'S') The string S may look like:

'00111001100011'

The code I tried can only spot occurrences of '0011', and strangely returns the index of the first '1'. For example for the S above, my code returns 2 instead of 0:

index = []
index = [n for n in range(len(S)) if S.find('0011', n) == n]

Then I tried to use regular expression but I the regex I found can't express the specific digit I want (like '0' and '1')

Could anyone kindly come up with a solution, and tell me why my first result returns index of '1' instead of '0'? Lot's f thanks in advance!!!!!

  • Correcting `string` to `S`, and changing `n-1` to `n` (I don't know why you thought it was necessary to subtract 1), this works fine. What's the problem? – TigerhawkT3 Feb 16 '17 at 18:16
  • 1
    I'm no regexp buff but what's wrong with `00+11+`? – Paul Panzer Feb 16 '17 at 18:23
  • A simple translation of your pattern into regex could be `0{2,}1{2,}` – juanpa.arrivillaga Feb 16 '17 at 18:26
  • `can't express the specific digit I want (like '0' and '1')` sure it can. –  Feb 16 '17 at 18:26
  • @PaulPanzer I think rather `0+1+` since you want at least *two* of each. In any event, laziness vs eagerness will have to be specified by the OP – juanpa.arrivillaga Feb 16 '17 at 18:27
  • 2
    @juanpa.arrivillaga - `0+1+` will accept one of each. `00+11+` is "0, then one or more 0s, then 1, then one or more 1s." The OP's code basically works anyway. – TigerhawkT3 Feb 16 '17 at 18:29
  • Corrected, thanks! My problem is that I can only find '0011', but I want the occurrence of at least two 0's followed by at least two 1's ( '0011' or 0001111' or '00111' or '000011' from the string S and find out the index of the first '0' if it's followed by more than two 0's and 1's (sorry a bit confusing). For example, for the S above, I want to return [0, 5, 9] – June_Stephanie Feb 16 '17 at 18:29
  • @TigerhawkT3 oops, you're right. This is why I tend to use the curly bracket quantifiers... – juanpa.arrivillaga Feb 16 '17 at 18:30
  • @juanpa.arrivillaga doesn't '0+' match _one_ ore more occurrences? – Paul Panzer Feb 16 '17 at 18:32
  • @June_Stephanie, don't reinvent the wheel, just use a regex as suggested in comments above. – Thierry Lathuille Feb 16 '17 at 18:33
  • @PaulPanzer yes, yes. My mistake :) – juanpa.arrivillaga Feb 16 '17 at 18:33
  • @PaulPanzer I tried 00+11+ but the index list returns empty. So I assume python recognize the expression as a string instead of repex. So Python searches for '00+11+' in S (and finds nothing). – June_Stephanie Feb 16 '17 at 18:35
  • It works for me try: `template = re.compile('00+11+')` and then `for m in template.finditer():` this should cycle through all occurrences. m will be match objects. Just play with them, they'll give the index and substring – Paul Panzer Feb 16 '17 at 18:43
  • "I assume python recognize the expression as a string instead of repex" - Did you make any effort to use a regular expression, or did you just pass `'00+11+'` to `S.find`? – TigerhawkT3 Feb 16 '17 at 18:43
  • @TigerhawkT3 I got stuck on this for at least one hour and asking here is my last resort. I search Stackoverflow for similar questions, learn regex on regex101.com, and using different functions and modules appears useful. But nothing works for me. Thanks for helping though. – June_Stephanie Feb 16 '17 at 18:50

1 Answers1

2

In the following code the regex defines a single instance of the required pattern of digits. Then uses the finditer iterator of the regex to identify successive matches in the given string S. match.start() gives the starting position of each of these matches, and the entire list is returned to starts.

S = '00111001100011'
r = re.compile(r'(0{2,}1{2,})')
starts = [match.start() for match in r.finditer(S)]
print(starts)
# [0, 5, 9]
Bill Bell
  • 21,021
  • 5
  • 43
  • 58
Thierry Lathuille
  • 23,663
  • 10
  • 44
  • 50