Sort Numbers By How Close They Are?

Question

I have a list of numbers looking like this:

numbers = [406.82, 406.93, 406.80, 406.89,
           443.22, 443.27, 
           415.01, 415.12, 415.2,
           443.71, 443.83,
           451.05, 451.14]

I want to group based on the how close they are:

numbers_grouped = [[406.82, 406.93, 406.80, 406.89]
                   [443.22, 443.27] 
                   [415.01, 415.12, 415.2]
                   [443.71, 443.83]
                   [451.05, 451.14]]

I tried this method but it doesn't seem to work,

sorting it by ascending order
then subtracting each number with its neighbouring numbers
if the number is less than 0.1 then it will be grouped else not

But is there a better method to solve this problem?

Why is 415.01 grouped with 415.12 when its more than 0.1 apart? Either way, you probably want something like https://stackoverflow.com/a/71678011/3483203 — user3483203, Aug 31 '22 at 14:22
Your method is good, but the threshold you use, 0.1, is too arbitrary. You need to find a way to calculate an appropriate threshold to better fit your data. — Stef, Aug 31 '22 at 14:23
Related: [stackoverflow: Clustering values by their proximity in python](https://stackoverflow.com/questions/18364026/clustering-values-by-their-proximity-in-python-machine-learning), [pypi: kmeans1d](https://pypi.org/project/kmeans1d/), [stats.stackexchange: How to find the number of clusters in 1d data and the mean of each](https://stats.stackexchange.com/questions/79314/how-to-find-the-number-of-clusters-in-1d-data-and-the-mean-of-each) — Stef, Aug 31 '22 at 14:33
With your data, any threshold between 0.45 and 7.2 would work. But 0.1 is too small. — Stef, Aug 31 '22 at 14:34
See also this: [How would one use Kernel Density Estimation as a 1D clustering method in scikit learn?](https://stackoverflow.com/a/35151947/3080723) — Stef, Aug 31 '22 at 14:38

score -1 · Answer 1 · answered Aug 31 '22 at 15:46

-1

Your method should actually work well. As you've tagged it with numpy I assume this is the library you want to use. We can easily find the "boundaries" where we have to cut up the sorted list by using diff, and then use cumsum to find the group index of each element. It is not extremely efficient but quite concise. Note that the output is a list of numpy arrays, as numpy arrays cannot be jagged:

import numpy as np
numbers = [406.82, 406.93, 406.80, 406.89,
           443.22, 443.27,
           415.01, 415.12, 415.2,
           443.71, 443.83,
           451.05, 451.14]
numbers.sort()
num = np.array(numbers)
groups = np.concatenate([[0], np.cumsum(np.diff(num) > 0.1)])  # compute indices 
grouped = [num[groups == ind] for ind in range(groups.max())]  # extract groups
print(grouped)  # list of numpy arrays,

answered Aug 31 '22 at 15:46

flawr

10,814
3
41
71

This answer uses the same arbitrary threshold 0.1 as the OP, which was actually the source of the issue. For instance, 443.71 and 443.83 won't be grouped together, because 443.83 - 443.71 = 0.12 > 0.1. – Stef Aug 31 '22 at 15:51
A possible solution would be to examine the values in `np.diff(num)` and try to automatically determine an appropriate threshold from those values. – Stef Aug 31 '22 at 15:52
Thank you, I didn't see that particular point, I interpreted OPs question as a technical one about the implementation. I agree another method might be more suitable for the problem, but without OP providing more information about the problem I don't think it make sense to suggest other approaches. – flawr Aug 31 '22 at 15:56
Thank you @flawr for the answer but, this is what I was looking for https://stackoverflow.com/a/71678011/3483203 – Henul Aug 31 '22 at 16:12
This is a direct copy of a linked answer – user3483203 Aug 31 '22 at 16:28
@user3483203 I'm sorry I didn't see any comments yet when I opened the question, but I agree it is pretty much *the* most concise way of doing it indeed. Note that in contrast to the answer you linked you can use `np.diff` as in my answer, you don't have to manually reimplement it. – flawr Aug 31 '22 at 18:02

Sort Numbers By How Close They Are?

1 Answers1