The idea is to group or cluster similar/same numbers as being in one group (or list) while other drastically different float numbers should be in a different group. If there are no similar/same float numbers it should be separate.
code1:
from itertools import groupby
x =[39.5999755859375,48.84002685546875,58.08001708984375,67.32000732421875,76.55999755859375,85.79998779296875,147.83999633789062,147.95999145507812,147.95999145507812,147.95999145507812,147.95999145507812,148.07998657226562,147.95999145507812,147.95999145507812,147.95999145507812,147.95999145507812,199.07998657226562,199.07998657226562,199.07998657226562,199.07998657226562,199.07998657226562]
groups = [list(g) for _, g in groupby(x, key=int)]
output1:
groups
Out[129]:
[[39.5999755859375],
[48.84002685546875],
[58.08001708984375],
[67.32000732421875],
[76.55999755859375],
[85.79998779296875],
[147.83999633789062,
147.95999145507812,
147.95999145507812,
147.95999145507812,
147.95999145507812],
[148.07998657226562],
[147.95999145507812,
147.95999145507812,
147.95999145507812,
147.95999145507812],
[199.07998657226562,
199.07998657226562,
199.07998657226562,
199.07998657226562,
199.07998657226562]]
Here what is good about the output is that it preserves the order of the float numbers, however, what is wrong is that for instance [148.07998657226562] is not considered among 147's (e.g 147.83999633789062,147.95999145507812).
code2 attempt to cluster:
import cluster
data =[39.5999755859375,48.84002685546875,58.08001708984375,67.32000732421875,76.55999755859375,85.79998779296875,147.83999633789062,147.95999145507812,147.95999145507812,147.95999145507812,147.95999145507812,148.07998657226562,147.95999145507812,147.95999145507812,147.95999145507812,147.95999145507812,199.07998657226562,199.07998657226562,199.07998657226562,199.07998657226562,199.07998657226562]
cl = cluster.HierarchicalClustering(data, lambda x,y: abs(x-y))
cl.getlevel(1)
output2:
[[39.5999755859375],
[48.84002685546875],
[58.08001708984375],
[85.79998779296875],
[67.32000732421875],
[76.55999755859375],
[199.07998657226562,
199.07998657226562,
199.07998657226562,
199.07998657226562,
199.07998657226562],
[148.07998657226562,
147.83999633789062,
147.95999145507812,
147.95999145507812,
147.95999145507812,
147.95999145507812,
147.95999145507812,
147.95999145507812,
147.95999145507812,
147.95999145507812]]
In this case, what is good about the output is its clustering. What's is wrong is that the order is altered.
The reason why the order is important is because these numbers are representing coordinates and it's a sequence that is already sorted previously (x variable). this any additional sorting beforehand or after alters the original (current x var) order.
The reason why it's so important to have the same order is because of its exportation order.
desired output:
[[39.5999755859375],
[48.84002685546875],
[58.08001708984375],
[67.32000732421875],
[76.55999755859375],
[85.79998779296875],
[147.83999633789062,
147.95999145507812,
147.95999145507812,
147.95999145507812,
147.95999145507812,
148.07998657226562,
147.95999145507812,
147.95999145507812,
147.95999145507812,
147.95999145507812],
[199.07998657226562,
199.07998657226562,
199.07998657226562,
199.07998657226562,
199.07998657226562]]