6

Given a list

A = [1, 6, 13, 15, 17, 18, 19, 21, 29, 36, 53, 58, 59, 61, 63, 78, 79, 81, 102, 114]

is there an easy way to group all those clusters where the difference between successive elements is smaller than 3?

That is, obtain something like:

[13, 15, 17, 19, 21], [58, 59, 61, 63], [78, 79, 81]

I was wondering whether there exists any built-in function, but I couldn't find anything similar. I was trying to figure it out using the groupby of the itertools, but I'm stuck. Thank you in advance.

urgeo
  • 645
  • 1
  • 9
  • 19
  • 1
    So are you ignoring any group with less than 3 elements? – TYZ Jul 06 '18 at 13:32
  • Well, I'm not ignoring them. In this array just there weren't two stand-alone elements whose difference was less than 3. If there were, it was ok to group them too. I'm looking for something like an "until" or "while" the difference is X, then group them. – urgeo Jul 06 '18 at 13:35
  • 1
    So, 1 and 6 should not be considered singleton clusters? – user2390182 Jul 06 '18 at 13:43
  • It's not necessary. But if you want you can keep them as singleton clusters and I can neglect them in a second moment on the basis of their length. The important thing is to "store elements until" the condition is satisfied :) – urgeo Jul 06 '18 at 13:50
  • regarding "store elements until" `itertools.takewhile` does exactly that. – sytech Jul 06 '18 at 14:00

4 Answers4

2

You can use itertools.groupby:

import itertools
A = [1, 6, 13, 15, 17, 18, 19, 21, 29, 36, 53, 58, 59, 61, 63, 78, 79, 81, 102, 114]
new_a = [(A[i+1]-A[i], A[i]) for i in range(len(A)-1)]
a = [[a, [c for _, c in b]] for a, b in itertools.groupby(new_a, key=lambda x:x[0] < 3)]
final_groups = [a[i][-1]+[a[i+1][-1][0]] if a[i+1][-1][0] - a[i][-1][-1] < 3 else a[i][-1] for i in range(len(a)-1) if a[i][0]]

Output:

[[13, 15, 17, 18, 19, 21], [58, 59, 61, 63], [78, 79, 81]]
Ajax1234
  • 69,937
  • 8
  • 61
  • 102
  • hi Ajax, I test your solution with `A = [25,26,53,54,55,56]`, but the results is `[[25, 26]]`. I think you have a bug in your solution – Toan Nguyen Phuoc Sep 23 '21 at 06:35
  • I think this is a really smooth idea. However, I think the final_groups call is faulty. My suggestion would be to have the second index of new_a have the tuple (A[i],A[i+1]) instead of just A[i] and replacing final_groups with something like [[t[1][0][0]] + [q[-1] for q in t[1]] for t in a if t[0]] – Tarje Bargheer Nov 05 '21 at 12:19
1

This is one approach using an iteration.

Ex:

A = [1, 6, 13, 15, 17, 18, 19, 21, 29, 36, 53, 58, 59, 61, 63, 78, 79, 81, 102, 114]
res = []
temp = []
l = len(A)-1

for i,v in enumerate(A):
    if i+1 > l:
        break

    if abs(v - A[i+1]) < 3:
        temp.append(v)
    else:
        if temp:
            temp.append(v)
            res.append(temp)
            temp = []
print(res)

Output:

[[13, 15, 17, 18, 19, 21], [58, 59, 61, 63], [78, 79, 81]]
Rakesh
  • 81,458
  • 17
  • 76
  • 113
1

Similarly to this answer, which asked for runs of the same number, one can use numpy.split here:

import numpy as np

def plateaus(A, atol=3):
    runs = np.split(A, np.where(np.abs(np.diff(A)) >= atol)[0] + 1)
    return [list(x) for x in runs if len(x) > 1]

A = [1, 6, 13, 15, 17, 18, 19, 21, 29, 36, 53, 58, 59, 61, 63, 78, 79, 81, 102, 114]
print(plateaus(A))
[[13, 15, 17, 18, 19, 21], [58, 59, 61, 63], [78, 79, 81]]

Without filtering on the length, this gives you singleton clusters just like the itertools.takehwile approach by @sytech.

Graipher
  • 6,891
  • 27
  • 47
1

Based on your comment

The important thing is to "store elements until" the condition is satisfied

You could use itertools.takewhile for this.

takewhile(predicate, iterable) --> takewhile object

Return successive entries from an iterable as long as the predicate evaluates to true for each entry.

This solution certainly has room for improvement, but the takeaway is the use of takewhile

class Grouper:
    """simple class to perform comparison when called, storing last element given"""
    def __init__(self, diff):
        self.last = None
        self.diff = diff
    def predicate(self, item):
        if self.last is None:
            return True
        return abs(self.last - item) < self.diff
    def __call__(self, item):
        """called with each item by takewhile"""
        result = self.predicate(item)
        self.last = item
        return result


def group_by_difference(items, diff=3):
    results = []
    start = 0
    remaining_items = items
    while remaining_items:
        g = Grouper(diff)
        group = [*itertools.takewhile(g, remaining_items)]
        results.append(group)
        start += len(group)
        remaining_items = items[start:]
    return results

This gives you grouped items with singleton clusters.

[[1],
 [6],
 [13, 15, 17, 18, 19, 21],
 [29],
 [36],
 [53],
 [58, 59, 61, 63],
 [78, 79, 81],
 [102],
 [114]]
sytech
  • 29,298
  • 3
  • 45
  • 86