counting the number of consecutive ocurrences and it's respective quantities on a list (Python)

Question

as I said in the title I want to calculate the consecutive ocurrences and it's respective quantities on a list.

For example:

['a','b','a','a'] should output [('a',1),('b',1),('a',2)] (or an equivalent format)

['a','a','b','b','b','d'] should output [('a', 2), ('b', 1),('d',1)]

I need this because I'm counting the number of consecutive ocurrences of a timeseries on a specific column but this problem is equivalent.

This is what I did:

list_to_summarize = ['a','a','b','b','b','d']

def summary_of_list(list_to_summarize):
    list_values = []
    list_quantities = []
    
    c = 0
    for index,value in enumerate(list_to_summarize):
        # base case
        if not list_values:
            list_values.append(value)
            c += 1
            continue

        # middle cases
        if index < len(list_to_summarize)-1:

            #if the last value is the same as the current value we add one to the counter
            if (list_values[-1] == value):
                c += 1

            #if the last value is different from the current value we add the last value to the list and reset the counter
            elif list_values[-1] != value:
                list_values.append(value)
                list_quantities.append(c)
                c = 0

        # Final Cases
        # if the value is the same as the last one but it is the last one we add one to the counter and we add the value and the counter to the lists
        if (index == len(list_to_summarize)-1):
            if list_values[-1] == value:
                c += 1
                list_quantities.append(c)
                list_values.append(value)
            else:
                list_quantities.append(1)
                list_values.append(value)
    return list(zip(list_values, list_quantities))

I'm close enough because on this example:

list_to_summarize = ['a','a','b','b','b','d']
summary_of_list(list_to_summarize)

outputs

[('a', 2), ('b', 1)]

Despite of the fact that this solution can be completed. I'm pretty sure that this can be done in a more Pythonic manner. Thanks in advance

I think the second example you provide should result in `[('a', 2), ('b', 3),('d',1)]` am I right? — Bill, Oct 12 '22 at 02:40

score 4 · Accepted Answer · answered Oct 12 '22 at 02:31

You can use itertools.groupby:

from itertools import groupby

def summary_of_list(lst):
    return [(k, sum(1 for _ in g)) for k, g in groupby(lst)]

print(summary_of_list(['a','b','a','a'])) # [('a', 1), ('b', 1), ('a', 2)]
print(summary_of_list(['a','a','b','b','b','d'])) # [('a', 2), ('b', 3), ('d', 1)]

(I believe your expected output [('a',2), ('b',1), ('d',1)] for ['a','a','b','b','b','d'] had a typo.)

counting the number of consecutive ocurrences and it's respective quantities on a list (Python)

1 Answers1