How to group approximately adjacent list

Question

I have a list that has approximately adjacent.

x=[10,11,13,70,71,73,170,171,172,174]

I need to separate this into lists which has minimum deviation (i.e)

y=[[10,11,13],[70,71,73],[170,171,172,174]]

You can see in y list grouped into 3 separate lists and break this list when meeting huge deviation. Can you give me a tip or any source to solve this?

Iterate through `x`. Keep track of the last-seen element and the current. Find if the current element is `deviation` above the previous. If it is, add it to a sublist. If not, new sublist. — Brad Solomon, Oct 05 '21 at 01:33
Take a look at the approaches in https://stackoverflow.com/questions/18364026/clustering-values-by-their-proximity-in-python-machine-learning — jgmh, Oct 05 '21 at 02:00

Alain T. · Answer 1 · 2021-10-05T05:46:02.617

the zip function is your friend when you need to compare items of a list with their successor or predecessor:

x=[10,11,13,70,71,73,170,171,172,174]

threshold = 50
breaks    = [i for i,(a,b) in enumerate(zip(x,x[1:]),1) if b-a>threshold]
groups    = [x[s:e] for s,e in zip([0]+breaks,breaks+[None])]

print(groups)
[[10, 11, 13], [70, 71, 73], [170, 171, 172, 174]]

breaks will contain the index (i) of elements (b) that are greater than their predecessor (a) by more than the treshold value.
Using zip() again allows you to pair up these break indexes to form start/end ranges which you can apply to the original list to get your groupings.

Note that i used a fixed threshold to detect a "huge" deviation, but you can use a percentage or any formula/condition of your choice in place of if b-a>threshold. If the deviation calculation is complex, you will probably want to make a deviates() function and use it in the list comprehension: if deviates(a,b) so that it remains intelligible

If zip() and list comprehensions are too advanced, you can do the same thing using a simple for-loop:

def deviates(a,b):  # example of a (huge) deviation detection function
    return b-a > 50  

groups   = []   # resulting list of groups
previous = None # track previous number for comparison
for number in x:
    if not groups or deviates(previous, number): 
        groups.append([number])   # 1st item or deviation, add new group 
    else:
        groups[-1].append(number) # approximately adjacent, add to last group
    previous = number             # remember previous value for next loop

Maybe in the deviate() function can add 3rd arguments as gap, to avoid hardcode and stay flexible. Like both. — Daniel Hao, Oct 05 '21 at 10:50

score 0 · Answer 2 · answered Oct 05 '21 at 01:45

Something like this should do the trick:

test_list = [10, 11, 13, 70, 71, 73, 170, 171, 172, 174]


def group_approximately_adjacent(numbers):
    if not numbers:
        return []

    current_number = numbers.pop(0)
    cluster = [current_number]
    clusters = [cluster]

    while numbers:
        next_number = numbers.pop(0)
        if is_approximately_adjacent(current_number, next_number):
            cluster.append(next_number)
        else:
            cluster = [next_number]
            clusters.append(cluster)
        current_number = next_number

    return clusters


def is_approximately_adjacent(a, b):
    deviation = 0.25
    return abs(a * (1 + deviation)) > abs(b) > abs(a * (1 - deviation))

I would recommend making a copy of the list at the start of the function so that it doesn't destroy the input (which may interfere with the calling code's logic). — Alain T., Oct 05 '21 at 05:41

How to group approximately adjacent list

2 Answers2