-1

I am trying to count the number of separated pairs of 1s in a string using a regex and re.findall() in Python. The regex applied to the string 11110 should return 2 and applied to 01101 it should return 1.

My code is the following:

matches = len(re.findall(r'1[\w]+1', str1))

But applied to 110110 it returns 1 as it is only finding the substring 11011. I would expect it to also find the substring 101. Is my regex wrong or is re.findall() not the function I should be using?

Thomas Tiotto
  • 379
  • 1
  • 3
  • 12
  • Try using this regex `11+`. Test your Regex [here](https://regex101.com/) before using it in your code. – Kunal Mukherjee Dec 17 '17 at 17:52
  • Try 11{2}. 11+ will not work as regex by nature will find longest match. – Shiv Dec 17 '17 at 17:54
  • @KunalMukherjee There's nothing wrong with the regex. The problem is that `re.findall` does not support overlapping. – klutt Dec 17 '17 at 18:03
  • Can you btw explain why it should give only one match on `01101`? You will have both `101` and `1101`. – klutt Dec 17 '17 at 18:09
  • Define what you mean by *separated pairs of 1s*? Also: `\w` includes `1` in the range of `0-9` so it not clear what you are looking for. – dawg Dec 17 '17 at 18:14
  • Also, for 110110 your description tells me it should be four mathes. **1**10**1**10 **1**101**1**0 1**1**0**1**10 and 1**1**01**1**0. – klutt Dec 17 '17 at 18:16
  • Perhaps I should have been clearer. I need UNIQUE matches after finding a match the 1 that define it shouldn't be reconsidered. That is why `110110` only matches twice. Maybe I could use `find` and insert two 0s in place of the 1s after the first match? But how would I know their exact position? – Thomas Tiotto Dec 18 '17 at 08:21

1 Answers1

1

From the documentation:

re.findall(pattern, string, flags=0)

Return all non-overlapping matches of pattern in string, as a list of strings.

The re module does not support overlapping, but regex does. Install the regex module and do like this:

>>> import regex
>>> regex.findall(r'1[\w]+1', '1111001', overlapped=True)
['1111001', '111001', '11001', '1001']
Community
  • 1
  • 1
klutt
  • 30,332
  • 17
  • 55
  • 95
  • Thank you but I tried `111010` and it is returning `3`, not `2` as I need. This because it is matching `11101`, `1101` and `101`. I need the 1s used for a matching not to be reconsidered. This way `111010` would match only to `11101` and `1101`. – Thomas Tiotto Dec 18 '17 at 08:23