0

The idea is to group or cluster similar/same numbers as being in one group (or list) while other drastically different float numbers should be in a different group. If there are no similar/same float numbers it should be separate.

code1:

from itertools import groupby

x =[39.5999755859375,48.84002685546875,58.08001708984375,67.32000732421875,76.55999755859375,85.79998779296875,147.83999633789062,147.95999145507812,147.95999145507812,147.95999145507812,147.95999145507812,148.07998657226562,147.95999145507812,147.95999145507812,147.95999145507812,147.95999145507812,199.07998657226562,199.07998657226562,199.07998657226562,199.07998657226562,199.07998657226562]

groups = [list(g) for _, g in groupby(x, key=int)]

output1:

groups
Out[129]: 
[[39.5999755859375],
[48.84002685546875],
[58.08001708984375],
[67.32000732421875],
[76.55999755859375],
[85.79998779296875],
[147.83999633789062,
 147.95999145507812,
 147.95999145507812,
 147.95999145507812,
 147.95999145507812],
[148.07998657226562],
[147.95999145507812,
 147.95999145507812,
 147.95999145507812,
 147.95999145507812],
[199.07998657226562,
 199.07998657226562,
 199.07998657226562,
 199.07998657226562,
 199.07998657226562]]

Here what is good about the output is that it preserves the order of the float numbers, however, what is wrong is that for instance [148.07998657226562] is not considered among 147's (e.g 147.83999633789062,147.95999145507812).

code2 attempt to cluster:

import cluster
data =[39.5999755859375,48.84002685546875,58.08001708984375,67.32000732421875,76.55999755859375,85.79998779296875,147.83999633789062,147.95999145507812,147.95999145507812,147.95999145507812,147.95999145507812,148.07998657226562,147.95999145507812,147.95999145507812,147.95999145507812,147.95999145507812,199.07998657226562,199.07998657226562,199.07998657226562,199.07998657226562,199.07998657226562]

cl = cluster.HierarchicalClustering(data, lambda x,y: abs(x-y))
cl.getlevel(1)

output2:

[[39.5999755859375],
 [48.84002685546875],
 [58.08001708984375],
 [85.79998779296875],
 [67.32000732421875],
 [76.55999755859375],
 [199.07998657226562,
  199.07998657226562,
  199.07998657226562,
  199.07998657226562,
  199.07998657226562],
 [148.07998657226562,
  147.83999633789062,
  147.95999145507812,
  147.95999145507812,
  147.95999145507812,
  147.95999145507812,
  147.95999145507812,
  147.95999145507812,
  147.95999145507812,
  147.95999145507812]]

In this case, what is good about the output is its clustering. What's is wrong is that the order is altered.

The reason why the order is important is because these numbers are representing coordinates and it's a sequence that is already sorted previously (x variable). this any additional sorting beforehand or after alters the original (current x var) order.

The reason why it's so important to have the same order is because of its exportation order.

desired output:

[[39.5999755859375],
[48.84002685546875],
[58.08001708984375],
[67.32000732421875],
[76.55999755859375],
[85.79998779296875],
[147.83999633789062,
 147.95999145507812,
 147.95999145507812,
 147.95999145507812,
 147.95999145507812,
 148.07998657226562,
 147.95999145507812,
 147.95999145507812,
 147.95999145507812,
 147.95999145507812],
[199.07998657226562,
 199.07998657226562,
 199.07998657226562,
 199.07998657226562,
 199.07998657226562]]
Ecko
  • 115
  • 9
  • You could use round/floor/ceil functions instead of a cast to int. – Hammond95 Dec 23 '20 at 14:02
  • Have you tried using round function instead? Since it's not possible – Ecko Dec 23 '20 at 14:16
  • Why would `148.079...` be together with the `147`s? What is the grouping criteria? `key=lambda f: f//10` gives your desired output but that might be a coinsidence – Tomerikoo Dec 23 '20 at 14:17
  • Because these are coordinates, and thesame sequence should be preserved since i'm locating and exporting those images whichare next to each other. That's the question, how to define the criteria inside the function so that it would take them into account (+- 1 integer differences)as being in the same group.(148with 147's) – Ecko Dec 23 '20 at 14:20
  • Maybe I misunderstood you. You want the resulting numbers to be ordered the same in the result as they are in the input. By this, you mean that within the group they should be ordered as in the input? – Brambor Dec 23 '20 at 14:35
  • 1
    @Brambor I've edited the post and added the desired output – Ecko Dec 23 '20 at 14:40
  • Is it required for the output to have the numbers in rising order? – Brambor Dec 23 '20 at 14:44

2 Answers2

1

Your desired output can be achieved by altering the grouping key to be:

key=lambda f: f//10

But this groups the numbers according to the tenth they are in. So for example 146.56 and 148.2 will also be grouped together. groupby only looks at each element individually and constructs a key from it. There is no "memory" of previous numbers so if you need some relative grouping you will need to do it manually:

groups = []
group = [x[0]]
for num in x[1:]:
    if abs(group[-1] - num) <= 1:
        group.append(num)
    else:
        groups.append(group)
        group = [num]
groups.append(group)

Note that this keeps checking according to the last number added to each group. So theoretically, you can have a group of [145.1, 146.0, 147.9, 148.7, ...]. If that is not desired, you can keep the difference according to a fixed point. Just change

if abs(group[-1] - num) <= 1:

to:

if abs(group[0] - num) <= 1:
Tomerikoo
  • 18,379
  • 16
  • 47
  • 61
0

If you stand by grouping by some function and all you want is to trace back the indexes, you could do something like this:

data =[39.5999755859375,48.84002685546875,58.08001708984375,67.32000732421875,76.55999755859375,85.79998779296875,147.83999633789062,147.95999145507812,147.95999145507812,147.95999145507812,147.95999145507812,148.07998657226562,147.95999145507812,147.95999145507812,147.95999145507812,147.95999145507812,199.07998657226562,199.07998657226562,199.07998657226562,199.07998657226562,199.07998657226562]

# data_out = [list(g) for _, g in groupby(x, key=int)]
data_out = [[39.5999755859375],
 [48.84002685546875],
 [58.08001708984375],
 [85.79998779296875],
 [67.32000732421875],
 [76.55999755859375],
 [199.07998657226562,
  199.07998657226562,
  199.07998657226562,
  199.07998657226562,
  199.07998657226562],
 [148.07998657226562,
  147.83999633789062,
  147.95999145507812,
  147.95999145507812,
  147.95999145507812,
  147.95999145507812,
  147.95999145507812,
  147.95999145507812,
  147.95999145507812,
  147.95999145507812]]

reindex = []
for d in data:
    for i, group in enumerate(data_out):
        if d in group:
            reindex.append(i)
            break

print(reindex)

This returns [0, 1, 2, 4, 5, 3, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 6, 6, 6, 6, 6] which are group indexes of the original data.

I don't know what the application of your code is, but you probably want a dict next. Since Python 3.6 dicts are insertion ordered, so you can replace your data with dict (data_dict) that points to which group they are in:

data_dict = dict((d, i) for d, i in zip(data, reindex))

returns:

{39.5999755859375: 0, 48.84002685546875: 1, 58.08001708984375: 2, 67.32000732421875: 4, 76.55999755859375: 5, 85.79998779296875: 3, 147.83999633789062: 7, 147.95999145507812: 7, 148.07998657226562: 7, 199.07998657226562: 6}

Maybe I misunderstood and you want the numbers to be ordered as they are in the input?

If so, you can get it from data_dict as simply as:

group_count = max(data_dict.values()) + 1
data_grouped = [[] for _ in range(group_count)]
for d in data:
    data_grouped[data_dict[d]].append(d)

print(data_grouped)

returns

[[39.5999755859375], [48.84002685546875], [58.08001708984375], [85.79998779296875], [67.32000732421875], [76.55999755859375], [199.07998657226562, 199.07998657226562, 199.07998657226562, 199.07998657226562, 199.07998657226562], [147.83999633789062, 147.95999145507812, 147.95999145507812, 147.95999145507812, 147.95999145507812, 148.07998657226562, 147.95999145507812, 147.95999145507812, 147.95999145507812, 147.95999145507812]]

Once again: You are not saying what is your code for.

Note: This is not a very elegant solution.

Brambor
  • 604
  • 1
  • 8
  • 25