passing arguments to key function in itertools.groupby() to count unique values for keys

Question

I want to calculate number of unique values of some parameter at certain time with two lists - one of values and one of timestamps (they contain millisecond info that is not really relevant and must be converted to seconds). Rn i have something like this

timestamps = ['00:22:33:645', '00:22:33:655', '00:22:34:645','00:22:34:745']
values = [1, 1, 2, 3]

grouped = groupby(zip(values, timestamps), lambda x: timestamp_to_seconds(x[1]))

but it results in

{1353:[(1, '00:22:33:645'), (1, '00:22:33:655')], 1354:[(2, '00:22:34:645'), (3, '00:22:34:745')]}

and i would prefer to keep only {1353:[1, 1], 1354:[2, 3]} so len(set(group)) would give accurate count. Is there a way to pass timestamps to key function without putting them in zip? Can lambda be skipped?

e: added actual example

Please share a [Minimal, Complete, and Verifiable example](https://stackoverflow.com/help/mcve) — yatu, Sep 18 '19 at 15:03
`groupby` always gives you tuples of `(key, group)`. Just unpack them as yatu suggests — C.Nivs, Sep 18 '19 at 15:06

pylang · Accepted Answer · 2019-09-18T18:22:02.650

You would have to post-process your groupby result. You can use a defaultdict.

Given

import time
import datetime as dt
import collections as ct


timestamps = ["00:22:33:645", "00:22:33:655", "00:22:34:645","00:22:34:745"]
values = [1, 1, 2, 3]


# Helper
def timestamp_to_seconds(ts: str) -> int:
    """Return an int in total seconds from a timestamp."""
    x = time.strptime(ts.rsplit(":", maxsplit=1)[0],"%H:%M:%S")
    res = dt.timedelta(hours=x.tm_hour, minutes=x.tm_min, seconds=x.tm_sec).total_seconds()
    return int(res)

Code

def regroup(tstamps: list, vals: list) -> dict:
    """Return a dict of seconds-value pairs."""
    dd = ct.defaultdict(list)

    for t, v in zip(tstamps, vals):        
        dd[timestamp_to_seconds(t)].append(v)

    return dict(dd)

Demo

regroup(timestamps, values)
# {1353: [1, 1], 1354: [2, 3]}

{k: len(g) for k, g in regroup(timestamps, values).items()}

# {1353: 2, 1354: 2}

See also a post on converting timestamps to seconds.

passing arguments to key function in itertools.groupby() to count unique values for keys

1 Answers1