1

I have a dataset organized into a dictionary of lists, like:

{ UUID: [3, 3, 5, 3, 0, 0, 3, 3, 2, 3, 2, 1, 1, 0, 2, 0, 5, 0, 0, 0, 0, 3, 4, 1, 2], 
  UUID: [1, 2, 3, 1, 0, 0, 2] }

I want to detect cases of consecutive identical values (esp. 0's), in particular detecting instances of n consecutive identical values.

For example, if n were 3 and the value was 0, I would append the UUID of the first key:value pair to a list of qualifying UUIDs, but not the second.

What's the most efficient way to detect consecutive identical values in this way?

kojiro
  • 74,557
  • 19
  • 143
  • 201
Thain
  • 245
  • 1
  • 3
  • 9
  • What happens if there's more than one viable candidate? – Jon Clements Mar 10 '14 at 20:06
  • i think youre going to just have to iterate through each key & each value in the list to find the consecutive vals. I don't think there will be any tricks that make it faster than that – bwbrowning Mar 10 '14 at 20:08
  • Do you mean within a list? E.g. if there are 3 consecutive 0's twice in a list. I'd add the UUID to the list once. It's a binary distinction; either the key:value pair qualifies (contains >=1 instance of consecutive identical integers) or it doesn't. – Thain Mar 10 '14 at 20:08
  • Also see https://stackoverflow.com/questions/38708692/identify-if-list-has-consecutive-elements-that-are-equal-in-python – PM 2Ring Jun 07 '17 at 13:59

2 Answers2

5

Use itertools.groupby to detect runs of consecutive numbers:

uuids = { 'a': [3, 3, 5, 3, 0, 0, 3, 3, 2, 3, 2, 1, 1, 0, 2, 0, 5, 0, 0, 0, 0, 3, 4, 1, 2], 
  'b': [1, 2, 3, 1, 0, 0, 2]}

from itertools import groupby 

def detect_runs_in_dict(d, n=3):
    return [uuid for uuid, val in d.items() #in python 2, use .iteritems
        if any(len(list(g)) >= n for k,g in groupby(val))]

demo

detect_runs_in_dict(uuids)
Out[28]: ['a']

detect_runs_in_dict(uuids,n=2)
Out[29]: ['a', 'b']

This doesn't discriminate on which value can be in "runs" - if you want to specify it, that's straightforward to add:

def detect_runs_in_dict(d, n=3, searchval=0):
    return [uuid for uuid, val in d.items() 
        if any(k == searchval and len(list(g)) >= n for k,g in groupby(val))]
roippi
  • 25,533
  • 4
  • 48
  • 73
1

You can use itertools.groupby to get the maximum-consecutive occurrence of a given value this way:

max(
  filter(
    lambda gr:gr[0]==0,
    groupby(_list)
  ), key=lambda gr:len(list(gr[1]))
)

You can reapply len(list()) to the second argument of the result, or you can simply adjust the filter to eliminate results with shorter-than-desired consecutive occurrences.

kojiro
  • 74,557
  • 19
  • 143
  • 201