Number of the same characters in a row - python

Question

I have a character (eg. "a") and I need to check a string (eg. "aaaabcd") for the number of occurances of "a" in a row (processing stops at "b" in this case and returned value is 4).

I have something like this:

def count_char(str_, ch_):
  count = 0
  for c in str_:
    if c == ch_:
      count += 1
    else:
      return count

So I was thinking... Is there a better/more pythonic/simplier way to do this?

This question appears to be a duplicate of http://stackoverflow.com/questions/991350/counting-repeated-characters-in-a-string-in-python which was found with the Google search: "python count repeated characters in string". — Charles Burns, Jun 02 '13 at 18:19
@TimPietzcker - Yes, and if it is `aaabcdaaa` it should return `3` — NZT, Jun 02 '13 at 18:24
@CharlesBurns Thanks for that, maybe I could use that too, but I do not think it is really duplicate as I need only number of occurances in a row and stop at different character no matter if the counted character is later found in the string again — NZT, Jun 02 '13 at 18:26

score 4 · Accepted Answer · edited Jun 02 '13 at 20:42

4

The re.match function will start looking in the beginning of the string

m = re.match(r'[%s]+' % ch_, str_)
return m.end() if m else 0

If you want the biggest number of chars in any part of the string:

max(len(x) for x in re.findall(r'[%s]+' % ch_, str_))

edited Jun 02 '13 at 20:42

jamylak

128,818
30
231
230

answered Jun 02 '13 at 18:16

JBernardo

32,262
10
90
115

Thanks, re.match works just perfectly, I should have checked the re module documentation more carefully – NZT Jun 02 '13 at 18:28
You have to escape characters that are special in regexes. – georg Jun 02 '13 at 20:04

score 4 · Answer 2 · answered Jun 02 '13 at 18:25

4

One option using itertools.takewhile,

>>> from itertools import takewhile
>>> str_ = 'aaaabcd'
>>> ch_ = 'a'
>>> sum(1 for _ in takewhile(lambda x: x == ch_, str_))
4

answered Jun 02 '13 at 18:25

Jared

25,627
7
56
61

`ch` isn't a builtin so not sure why you give it a trailing underscore – jamylak Jun 02 '13 at 20:40
@jamylak Right. I was just using OP's variable names. – Jared Jun 02 '13 at 20:44
@jamylak Habit for naming function parameters, but I see that according to PEP8 it is used to avoid conflicts with keywords – NZT Jun 02 '13 at 23:42
@NZT yeah that's why I was confused – jamylak Jun 02 '13 at 23:43

score 2 · Answer 3 · answered Jun 02 '13 at 20:49

2

If you only care about the beginning of the string, you could use lstrip and compare lengths:

>>> x = "aaaabcd"
>>> len(x) - len(x.lstrip("a"))
4

Maybe not the most efficient way, but most likely the simplest.

answered Jun 02 '13 at 20:49

lqc

7,434
1
25
25

processing doesn't stop at b in this case – jamylak Jun 02 '13 at 20:51
@jamylak I'm not sure what you mean. – lqc Jun 02 '13 at 21:00
I'm not sure but `x.lstrip` would have to make a new string, possibly it still uses the memory of the old one, anyway +1 – jamylak Jun 02 '13 at 21:01
@lqc Thanks, probably the simplest solution – NZT Jun 02 '13 at 23:44

score 0 · Answer 4 · edited Jun 02 '13 at 18:50

You could borrow from the itertools module:

from itertools import takewhile, groupby

def startcount1(s, c):
    group = takewhile(lambda x: x == c, s)
    return len(list(group))

def startcount2(s, c):
    key, group = next(groupby(s))
    return len(list(group)) if key == c else 0

After which

tests = ['aaaabcd', 'baaaabcd', 'abacadae', 'aaabcdaaa']
for test in tests:
    print test,
    for f in count_char, startcount1, startcount2:
        print f(test, 'a'),
    print

will produce

aaaabcd 4 4 4
baaaabcd 0 0 0
abacadae 1 1 1
aaabcdaaa 3 3 3

If you really cared you could use sum(1 for _ in ..) instead of len(list(..)) to avoid materializing the list, but I find I care less about things like that in my old age. :^)

score 0 · Answer 5 · answered Jun 02 '13 at 20:40

0

>>> from itertools import takewhile
>>> sum(1 for c in takewhile('a'.__eq__, 'aaaabcd'))
4

answered Jun 02 '13 at 20:40

jamylak

128,818
30
231
230

Number of the same characters in a row - python

5 Answers5

Linked