0

I have a list that has approximately adjacent.

x=[10,11,13,70,71,73,170,171,172,174]

I need to separate this into lists which has minimum deviation (i.e)

y=[[10,11,13],[70,71,73],[170,171,172,174]]

You can see in y list grouped into 3 separate lists and break this list when meeting huge deviation. Can you give me a tip or any source to solve this?

Brad Solomon
  • 38,521
  • 31
  • 149
  • 235
  • 2
    Iterate through `x`. Keep track of the last-seen element and the current. Find if the current element is `deviation` above the previous. If it is, add it to a sublist. If not, new sublist. – Brad Solomon Oct 05 '21 at 01:33
  • 1
    Take a look at the approaches in https://stackoverflow.com/questions/18364026/clustering-values-by-their-proximity-in-python-machine-learning – jgmh Oct 05 '21 at 02:00

2 Answers2

2

the zip function is your friend when you need to compare items of a list with their successor or predecessor:

x=[10,11,13,70,71,73,170,171,172,174]

threshold = 50
breaks    = [i for i,(a,b) in enumerate(zip(x,x[1:]),1) if b-a>threshold]
groups    = [x[s:e] for s,e in zip([0]+breaks,breaks+[None])]

print(groups)
[[10, 11, 13], [70, 71, 73], [170, 171, 172, 174]]
  • breaks will contain the index (i) of elements (b) that are greater than their predecessor (a) by more than the treshold value.
  • Using zip() again allows you to pair up these break indexes to form start/end ranges which you can apply to the original list to get your groupings.

Note that i used a fixed threshold to detect a "huge" deviation, but you can use a percentage or any formula/condition of your choice in place of if b-a>threshold. If the deviation calculation is complex, you will probably want to make a deviates() function and use it in the list comprehension: if deviates(a,b) so that it remains intelligible

If zip() and list comprehensions are too advanced, you can do the same thing using a simple for-loop:

def deviates(a,b):  # example of a (huge) deviation detection function
    return b-a > 50  

groups   = []   # resulting list of groups
previous = None # track previous number for comparison
for number in x:
    if not groups or deviates(previous, number): 
        groups.append([number])   # 1st item or deviation, add new group 
    else:
        groups[-1].append(number) # approximately adjacent, add to last group
    previous = number             # remember previous value for next loop
Alain T.
  • 40,517
  • 4
  • 31
  • 51
  • 1
    Maybe in the deviate() function can add 3rd arguments as gap, to avoid hardcode and stay flexible. Like both. – Daniel Hao Oct 05 '21 at 10:50
0

Something like this should do the trick:

test_list = [10, 11, 13, 70, 71, 73, 170, 171, 172, 174]


def group_approximately_adjacent(numbers):
    if not numbers:
        return []

    current_number = numbers.pop(0)
    cluster = [current_number]
    clusters = [cluster]

    while numbers:
        next_number = numbers.pop(0)
        if is_approximately_adjacent(current_number, next_number):
            cluster.append(next_number)
        else:
            cluster = [next_number]
            clusters.append(cluster)
        current_number = next_number

    return clusters


def is_approximately_adjacent(a, b):
    deviation = 0.25
    return abs(a * (1 + deviation)) > abs(b) > abs(a * (1 - deviation))
Paul Whipp
  • 16,028
  • 4
  • 42
  • 54
  • 4
    I would recommend making a copy of the list at the start of the function so that it doesn't destroy the input (which may interfere with the calling code's logic). – Alain T. Oct 05 '21 at 05:41