37

How would I count consecutive characters in Python to see the number of times each unique digit repeats before the next unique digit?

At first, I thought I could do something like:

word = '1000'

counter = 0
print range(len(word))

for i in range(len(word) - 1):
    while word[i] == word[i + 1]:
        counter += 1
        print counter * "0"
    else:
        counter = 1
        print counter * "1"

So that in this manner I could see the number of times each unique digit repeats. But this, of course, falls out of range when i reaches the last value.

In the example above, I would want Python to tell me that 1 repeats 1, and that 0 repeats 3 times. The code above fails, however, because of my while statement.

How could I do this with just built-in functions?

mkrieger1
  • 19,194
  • 5
  • 54
  • 65
vashts85
  • 1,069
  • 3
  • 14
  • 28
  • 1
    What wrong with using `len(word) - 1`? Would also think you would need to initialize `counter` to 1 – SGM1 Dec 23 '15 at 21:25
  • OK that actually helps a lot .... I'll keep working and see if I can come up with a solution! – vashts85 Dec 23 '15 at 21:27
  • 1
    why dont you add another if clause checking if i is bigger than len(word) – ivan Dec 23 '15 at 21:28
  • 1
    Possible duplicate of [Count number of specific elements in between other elements in list](http://stackoverflow.com/questions/34401733/count-number-of-specific-elements-in-between-other-elements-in-list) – baldr Dec 23 '15 at 21:29
  • If your string was, instead, `'100011'` what would you want the output to be? My answer assumes `[("1", 1), ("0", 3), ("1", 2)]` but maybe you want something more nuanced than that? – Adam Smith Dec 23 '15 at 22:04
  • Does this answer your question? [Counting consecutive characters in a string](/q/13197668/90527) – outis Nov 06 '22 at 05:45

16 Answers16

80

Consecutive counts:

You can use itertools.groupby:

s = "111000222334455555"

from itertools import groupby

groups = groupby(s)
result = [(label, sum(1 for _ in group)) for label, group in groups]

After which, result looks like:

[("1": 3), ("0", 3), ("2", 3), ("3", 2), ("4", 2), ("5", 5)]

And you could format with something like:

", ".join("{}x{}".format(label, count) for label, count in result)
# "1x3, 0x3, 2x3, 3x2, 4x2, 5x5"

Total counts:

Someone in the comments is concerned that you want a total count of numbers so "11100111" -> {"1":6, "0":2}. In that case you want to use a collections.Counter:

from collections import Counter

s = "11100111"
result = Counter(s)
# {"1":6, "0":2}

Your method:

As many have pointed out, your method fails because you're looping through range(len(s)) but addressing s[i+1]. This leads to an off-by-one error when i is pointing at the last index of s, so i+1 raises an IndexError. One way to fix this would be to loop through range(len(s)-1), but it's more pythonic to generate something to iterate over.

For string that's not absolutely huge, zip(s, s[1:]) isn't a a performance issue, so you could do:

counts = []
count = 1
for a, b in zip(s, s[1:]):
    if a==b:
        count += 1
    else:
        counts.append((a, count))
        count = 1

The only problem being that you'll have to special-case the last character if it's unique. That can be fixed with itertools.zip_longest

import itertools

counts = []
count = 1
for a, b in itertools.zip_longest(s, s[1:], fillvalue=None):
    if a==b:
        count += 1
    else:
        counts.append((a, count))
        count = 1

If you do have a truly huge string and can't stand to hold two of them in memory at a time, you can use the itertools recipe pairwise.

def pairwise(iterable):
    """iterates pairwise without holding an extra copy of iterable in memory"""
    a, b = itertools.tee(iterable)
    next(b, None)
    return itertools.zip_longest(a, b, fillvalue=None)

counts = []
count = 1
for a, b in pairwise(s):
    ...
mkrieger1
  • 19,194
  • 5
  • 54
  • 65
Adam Smith
  • 52,157
  • 12
  • 73
  • 112
  • @baldr thanks for the `list` cast edit. I forgot it gives you some funky view instead with no `__len__` defined. I prefer to sum the generator in those cases rather than building a list to throw away but YMMV – Adam Smith Dec 23 '15 at 22:09
  • Isn't this just counting? What if you separate some of the numbers, like `11122111`? – code_dredd Dec 23 '15 at 22:10
  • @ray Then you get `[("1", 3), ("2", 2), ("1", 3)]`. The OP wants *consecutive* characters. – Adam Smith Dec 23 '15 at 22:11
  • If you think they want a count of *all* characters, then you just do `collections.Counter(s)` and call it a day. – Adam Smith Dec 23 '15 at 22:12
  • @AdamSmith: I understand what the OP wants. The code you provided groups each consecutive group, and does not provide the total for all the sub-groups you've broken this into, i.e. I was expecting `("1": 6)` instead of 2 tuples with `("1": 3)`. So this seems to be slightly different from what was asked, AFAIK – code_dredd Dec 23 '15 at 22:14
  • @ray I don't believe that's what the OP wants, but have included that canonical solution in case for some reason he means the opposite of what his question says. – Adam Smith Dec 23 '15 at 22:15
  • @adam-smith, I also think that the first part is what exactly was needed. This answer is the best as for me. It uses true pythonic-way. – baldr Dec 23 '15 at 22:16
  • @ray you'll notice that his sample code only loops while the current letter is the same as the next letter, not while the current letter still exists in the string. – Adam Smith Dec 23 '15 at 22:17
  • @AdamSmith: As I told baldr, that's not what I understood when the OP said he/she wanted to "see the number of times *each unique digit repeats*" (my emphasis). To me, that implies getting totals and says nothing about the sub-grouping that you and baldr want to produce. I think the question could've been clearer by posting a sample output. – code_dredd Dec 23 '15 at 22:20
  • @ray well he did post a sample output, but it's not clear which meaning he's going for since his sample input doesn't have any repeating but non-consecutive characters. – Adam Smith Dec 23 '15 at 22:29
  • 1
    @AdamSmith: I gave you my +1. I think your answer is good and the use of built-in functionality is also better. It was at least my understanding (before OP's edit) that the OP was interested in an algorithm rather than built-in way of doing it. – code_dredd Dec 23 '15 at 22:29
  • @AdamSmith: Yes, but you know I meant the ambiguity here is with the subgroupings, which the OP did not include in the sample, as you mentioned. – code_dredd Dec 23 '15 at 22:30
  • @AdamSmith, why not `len(list(group))` vs `sum(1 for _ in group)`? Also, what's the underscore, a wildcard? – alancalvitti Jan 24 '19 at 18:02
  • @alancalvitti oof this is a *very* old answer :). `len(list(group))` has to allocate a list, only to access one attribute of it (`__len__`), then throw it away. `sum(1 for _ in group)` doesn't need to do that. This is mostly performance for performance's sake, but if `group`'s length was in the many hundreds of thousands, it's significant. `_` is just a placeholder here. This is a Python convention that means "The syntax requires that I bind this value to a name, but I don't actually USE this value." Also used a lot as `for _ in range(10): action()` (which does calls `action` 10 times). – Adam Smith Jan 24 '19 at 19:48
  • @AdamSmith, thanks for explaining. Why doesn't `group` (and similar container objects) support `len` method? Ps in WL, it's `Range[10] // Map[action]`, haven't used for loops since C/C++, it's like going back to assembly. – alancalvitti Jan 25 '19 at 15:10
  • @alancalvitti the group object is not a container, it's a generator. Iterating over it to find out when it stops producing values would *consume* those values, and having `len(my_generator)` cause `my_generator` to be empty afterwards is non-obvious. Sounds like WL is more of a functional language -- of course it wouldn't have loops :) – Adam Smith Jan 25 '19 at 16:50
  • @AdamSmith, could `group` support a `len` method or attribute similar to `DataFrame` `shape`? I understand generators are lazy/JIT and can be used with streams where length is undefined, but clearly in this situation length is determined. – alancalvitti Jan 25 '19 at 18:52
  • @alancalvitti I'm not sure what `groupby` is doing under the hood, but I don't *think* it's doing any work ahead of time, so calculating the length and storing it in an attribute so you can look it up later is doing a lot of work and potentially costing quite a bit of memory to keep from losing those objects when they've run out of the generator. I'm not sure that it's worth avoiding the common `sum(1 for _ in some_generator)` idiom. – Adam Smith Jan 25 '19 at 20:44
  • 1
    This is a bit old post, but I was trying this `zip_longest` solution and noticed that a reset of counts, `count = 1`, is also necessary as the last `else` statement (just as in the case of `zip`), otherwise the result will not be correct. – PedroA Sep 21 '19 at 00:16
25

A solution "that way", with only basic statements:

word="100011010" #word = "1"
count=1
length=""
if len(word)>1:
    for i in range(1,len(word)):
       if word[i-1]==word[i]:
          count+=1
       else :
           length += word[i-1]+" repeats "+str(count)+", "
           count=1
    length += ("and "+word[i]+" repeats "+str(count))
else:
    i=0
    length += ("and "+word[i]+" repeats "+str(count))
print (length)

Output :

'1 repeats 1, 0 repeats 3, 1 repeats 2, 0 repeats 1, 1 repeats 1, and 0 repeats 1'
#'1 repeats 1'
remedcu
  • 526
  • 1
  • 10
  • 31
B. M.
  • 18,243
  • 2
  • 35
  • 54
3

Totals (without sub-groupings)

#!/usr/bin/python3 -B

charseq = 'abbcccdddd'
distros = { c:1 for c in charseq  }

for c in range(len(charseq)-1):
    if charseq[c] == charseq[c+1]:
        distros[charseq[c]] += 1

print(distros)

I'll provide a brief explanation for the interesting lines.

distros = { c:1 for c in charseq  }

The line above is a dictionary comprehension, and it basically iterates over the characters in charseq and creates a key/value pair for a dictionary where the key is the character and the value is the number of times it has been encountered so far.

Then comes the loop:

for c in range(len(charseq)-1):

We go from 0 to length - 1 to avoid going out of bounds with the c+1 indexing in the loop's body.

if charseq[c] == charseq[c+1]:
    distros[charseq[c]] += 1

At this point, every match we encounter we know is consecutive, so we simply add 1 to the character key. For example, if we take a snapshot of one iteration, the code could look like this (using direct values instead of variables, for illustrative purposes):

# replacing vars for their values
if charseq[1] == charseq[1+1]:
    distros[charseq[1]] += 1

# this is a snapshot of a single comparison here and what happens later
if 'b' == 'b':
    distros['b'] += 1

You can see the program output below with the correct counts:

➜  /tmp  ./counter.py
{'b': 2, 'a': 1, 'c': 3, 'd': 4}
code_dredd
  • 5,915
  • 1
  • 25
  • 53
  • He don't need just a count. He asks for the consecutive chars. Like: `aabbbcdefabc`: `a:2, b:3, c:1, d:1, ...` – baldr Dec 23 '15 at 21:33
  • Replace `for c in sentence` with `for c in zip(sentence, sentence[1:])` – inspectorG4dget Dec 23 '15 at 21:36
  • @inspectorG4dget: Did something a bit different as that change would not have worked with the previous code. – code_dredd Dec 23 '15 at 21:54
  • @baldr: It produces `{'a': 6, 'd': 4, 'f': 2, 'b': 2, 'c': 3}`. It looks ok to me. If it won't work, can you be more specific on what, exactly, you think the issue is? – code_dredd Dec 23 '15 at 22:11
  • @ray, expected output should be `a:1, b:2, c:3, d:4, a:4, f:2, a:3` – baldr Dec 23 '15 at 22:12
  • @baldr: That's not what I understood when the OP said he/she wanted to "see the number of times *each unique digit repeats*" (my emphasis). To me, that implies getting totals and says nothing about the sub-grouping that you and Adam are producing. – code_dredd Dec 23 '15 at 22:19
  • @baldr's comment describes what I was going for. – vashts85 Dec 24 '15 at 02:23
1

You only need to change len(word) to len(word) - 1. That said, you could also use the fact that False's value is 0 and True's value is 1 with sum:

sum(word[i] == word[i+1] for i in range(len(word)-1))

This produces the sum of (False, True, True, False) where False is 0 and True is 1 - which is what you're after.

If you want this to be safe you need to guard empty words (index -1 access):

sum(word[i] == word[i+1] for i in range(max(0, len(word)-1)))

And this can be improved with zip:

sum(c1 == c2 for c1, c2 in zip(word[:-1], word[1:]))
Reut Sharabani
  • 30,449
  • 6
  • 70
  • 88
1

If we want to count consecutive characters without looping, we can make use of pandas:

In [1]: import pandas as pd

In [2]: sample = 'abbcccddddaaaaffaaa'
In [3]: d = pd.Series(list(sample))

In [4]: [(cat[1], grp.shape[0]) for cat, grp in d.groupby([d.ne(d.shift()).cumsum(), d])]
Out[4]: [('a', 1), ('b', 2), ('c', 3), ('d', 4), ('a', 4), ('f', 2), ('a', 3)]

The key is to find the first elements that are different from their previous values and then make proper groupings in pandas:

In [5]: sample = 'abba'
In [6]: d = pd.Series(list(sample))

In [7]: d.ne(d.shift())
Out[7]:
0     True
1     True
2    False
3     True
dtype: bool

In [8]: d.ne(d.shift()).cumsum()
Out[8]:
0    1
1    2
2    2
3    3
dtype: int32
Alpha
  • 2,372
  • 3
  • 21
  • 23
0

This is my simple code for finding maximum number of consecutive 1's in binaray string in python 3:

count= 0
maxcount = 0
for i in str(bin(13)):
    if i == '1':
        count +=1
    elif count > maxcount:
        maxcount = count;
        count = 0
    else:
        count = 0
if count > maxcount: maxcount = count        
maxcount
  • 1
    This doesn't really answer the question. OP wants to count the number of consecutive characters for each character in the string. – nbryans Jan 25 '17 at 18:20
0

There is no need to count or groupby. Just note the indices where a change occurs and subtract consecutive indicies.

w = "111000222334455555"
iw = [0] + [i+1 for i in range(len(w)-1) if w[i] != w[i+1]] + [len(w)]
dw = [w[i] for i in range(len(w)-1) if w[i] != w[i+1]] + [w[-1]]
cw = [ iw[j] - iw[j-1] for j in range(1, len(iw) ) ]

print(dw)  # digits
['1', '0', '2', '3', '4']
print(cw)  # counts
[3, 3, 3, 2, 2, 5]

w = 'XXYXYYYXYXXzzzzzYYY'
iw = [0] + [i+1 for i in range(len(w)-1) if w[i] != w[i+1]] + [len(w)]
dw = [w[i] for i in range(len(w)-1) if w[i] != w[i+1]] + [w[-1]]
cw = [ iw[j] - iw[j-1] for j in range(1, len(iw) ) ]
print(dw)  # characters
print(cw)  # digits

['X', 'Y', 'X', 'Y', 'X', 'Y', 'X', 'z', 'Y']
[2, 1, 1, 3, 1, 1, 2, 5, 3]
ShpielMeister
  • 1,417
  • 10
  • 23
0

A one liner that returns the amount of consecutive characters with no imports:

def f(x):s=x+" ";t=[x[1] for x in zip(s[0:],s[1:],s[2:]) if (x[1]==x[0])or(x[1]==x[2])];return {h: t.count(h) for h in set(t)}

That returns the amount of times any repeated character in a list is in a consecutive run of characters.

alternatively, this accomplishes the same thing, albeit much slower:

def A(m):t=[thing for x,thing in enumerate(m) if thing in [(m[x+1] if x+1<len(m) else None),(m[x-1] if x-1>0 else None)]];return {h: t.count(h) for h in set(t)}

In terms of performance, I ran them with

site = 'https://web.njit.edu/~cm395/theBeeMovieScript/'
s = urllib.request.urlopen(site).read(100_000)
s = str(copy.deepcopy(s))
print(timeit.timeit('A(s)',globals=locals(),number=100))
print(timeit.timeit('f(s)',globals=locals(),number=100))

which resulted in:

12.528256356999918
5.351301653001428

This method can definitely be improved, but without using any external libraries, this was the best I could come up with.

akime
  • 85
  • 3
0

In python

your_string = "wwwwweaaaawwbbbbn"
current = ''
count = 0
for index, loop in enumerate(your_string):
    current = loop
    count = count + 1
    if index == len(your_string)-1:
        print(f"{count}{current}", end ='')
        break

    if your_string[index+1] != current:
        print(f"{count}{current}",end ='')
        count = 0
        continue

This will output

5w1e4a2w4b1n
0
#I wrote the code using simple loops and if statement
s='feeekksssh' #len(s) =11
count=1  #f:0, e:3, j:2, s:3 h:1
l=[]
for i in range(1,len(s)): #range(1,10)
    if s[i-1]==s[i]:
        count = count+1
    else:
        l.append(count)
        count=1
    if i == len(s)-1: #To check the last character sequence we need loop reverse order
        reverse_count=1
        for i in range(-1,-(len(s)),-1): #Lopping only for last character
            if s[i] == s[i-1]:
                reverse_count = reverse_count+1
            else:
                l.append(reverse_count)
                break
print(l)
0

Today I had an interview and was asked the same question. I was struggling with the original solution in mind:

s = 'abbcccda'

old = ''
cnt = 0
res = ''
for c in s:
    cnt += 1
    if old != c:
        res += f'{old}{cnt}'
        old = c
        cnt = 0  # default 0 or 1 neither work
print(res)
#  1a1b2c3d1

Sadly this solution always got unexpected edge cases result(is there anyone to fix the code? maybe i need post another question), and finally timeout the interview.

After the interview I calmed down and soon got a stable solution I think(though I like the groupby best).

s = 'abbcccda'

olds = []
for c in s:
    if olds and c in olds[-1]:
        olds[-1].append(c)
    else:
        olds.append([c])
print(olds)
res = ''.join([f'{lst[0]}{len(lst)}' for lst in olds])
print(res)

#  [['a'], ['b', 'b'], ['c', 'c', 'c'], ['d'], ['a']]
#  a1b2c3d1a1
Lei Yang
  • 3,970
  • 6
  • 38
  • 59
0

Here is my simple solution:

def count_chars(s):
    size = len(s)
    count = 1
    op = ''
    for i in range(1, size):
        if s[i] == s[i-1]:
            count += 1
        else:
            op += "{}{}".format(count, s[i-1])
            count = 1
    if size:
        op += "{}{}".format(count, s[size-1])

    return op
0
data_input = 'aabaaaabbaaaaax'
start = 0
end = 0
temp_dict = dict()
while start < len(data_input):
  if data_input[start] == data_input[end]:
     end = end + 1
  if end == len(data_input):
     value = data_input[start:end]
     temp_dict[value] = len(value)
     break
  if data_input[start] != data_input[end]:
     value = data_input[start:end]
     temp_dict[value] = len(value)
     start = end
print(temp_dict)
akshay
  • 91
  • 1
  • 3
  • Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Jul 14 '22 at 21:03
0

PROBLEM: we need to count consecutive characters and return characters with their count.

def countWithString(input_string:str)-> str:
    count = 1
    output = ''
 
    for i in range(1,len(input_string)):
        if input_string[i]==input_string[i-1]:
            count +=1
        else:
            output += f"{count}{input_string[i-1]}"
            count = 1
    # Used to add last string count (at last else condition will not run and data will not be inserted to ouput string)
    output += f"{count}{input_string[-1]}"
    return output

countWithString(input)

input:'aaabbbaabbcc' output:'3a3b2a2b2c'

Time Complexity: O(n) Space Complexity: O(1)

iampritamraj
  • 196
  • 2
  • 6
0
temp_str = "aaaajjbbbeeeeewwjjj"
def consecutive_charcounter(input_str):
    counter = 0
    temp_list = []
    for i in range(len(input_str)):
        if i==0:
            counter+=1
        elif input_str[i]== input_str[i-1]:
            counter+=1
            if i == len(input_str)-1:
                temp_list.extend([input_str[i - 1], str(counter)])
        else:
            temp_list.extend([input_str[i-1],str(counter)])
            counter = 1
    print("".join(temp_list))

consecutive_charcounter(temp_str)

0

This is the best and fastest way to count repeated elements in string

    from collections import Counter
    magazine = "aab"
        char_count = Counter(magazine)
        
        print(dict(char_count))
Mohamed Fathallah
  • 1,274
  • 1
  • 15
  • 17