0

Right now, I got four different sets, which are some coordinate points, as an example as shown below. And my goal is to combine the sets based on their similarity automatically but not manually.

set_0 = {(540, 413), (30, 223), (177, 254), (40, 206), (31, 203), (1118, 652), (208, 260), (374, 350), (64, 212), (528, 411), (825, 527), (253, 304), (1086, 638), (316, 326), (84, 242), (998, 598), (0, 215), (183, 254), (57, 212), (215, 289), (245, 299), (90, 221), (966, 584), (843, 535), (1081, 638), (1137, 664), (37, 203), (126, 256), (895, 554), (541, 416), (1150, 667), (1190, 685), (0, 193), (176, 251), (710, 478), (73, 241), (169, 251), (1099, 646), (6, 193), (532, 410), (65, 215), (193, 260), (118, 230), (201, 286), (961, 584), (1185, 685), (979, 592), (47, 206), (822, 523), (1255, 719), (225, 295), (556, 419), (206, 263), (601, 439), (830, 527), (769, 505), (111, 230)}
set_1 = {(1241, 689), (1084, 614), (1190, 662), (177, 254), (710, 455), (253, 304), (316, 326), (1155, 649), (84, 242), (526, 383), (1110, 626), (0, 215), (884, 527), (183, 254), (57, 212), (126, 256), (806, 494), (491, 372), (1279, 704), (1018, 584), (823, 503), (1033, 593), (502, 374), (1177, 659), (569, 403), (564, 398), (208, 260), (1097, 623), (64, 212), (1249, 693), (729, 465), (751, 471), (913, 542), (1001, 579), (1265, 701), (590, 409), (377, 329), (176, 251), (6, 193), (1142, 641), (830, 503), (1246, 689), (1255, 719), (1081, 615), (30, 223), (610, 417), (977, 569), (215, 289), (245, 299), (90, 221), (601, 416), (1209, 674), (37, 203), (1193, 666), (1156, 647), (1113, 630), (521, 383), (169, 251), (65, 215), (1124, 633), (1078, 611), (1038, 593), (201, 286), (1094, 619), (206, 263), (657, 437), (1279, 719), (1006, 579), (111, 230), (40, 206), (31, 203), (374, 350), (833, 507), (768, 481), (942, 551), (1214, 674), (1017, 586), (829, 505), (0, 193), (73, 241), (553, 396), (193, 260), (1065, 608), (118, 230), (47, 206), (998, 575), (225, 295), (478, 365), (1137, 641)}
set_2 = {(929, 676), (793, 569), (529, 416), (377, 355), (588, 447), (726, 527), (889, 652), (926, 671), (718, 523), (904, 631), (534, 442), (718, 550), (995, 715), (870, 638), (582, 470), (0, 151), (976, 674), (662, 517), (152, 202), (1005, 719), (665, 522), (950, 685), (961, 667), (182, 218), (1033, 710), (956, 689), (913, 638), (697, 541), (942, 653), (753, 573), (905, 634), (154, 232), (65, 184), (1028, 704), (1027, 706), (830, 588), (670, 523), (643, 481), (1050, 719), (988, 680), (337, 307), (785, 592), (497, 424), (39, 140), (153, 205), (208, 260), (971, 700), (644, 479), (54, 147), (119, 212), (630, 471), (30, 134), (622, 467), (433, 387), (585, 448), (201, 257), (166, 235), (1016, 698), (729, 559), (17, 130), (846, 625), (273, 272), (873, 643), (729, 531), (174, 214), (857, 634), (121, 187), (377, 329), (806, 601), (518, 433), (990, 709), (790, 592), (502, 424), (654, 485), (89, 170), (61, 154), (25, 134), (18, 128), (70, 184), (934, 676), (798, 569), (190, 249), (534, 416), (382, 355), (809, 578), (870, 612), (1017, 701), (94, 196), (894, 652), (913, 665), (742, 536), (206, 257), (905, 661), (198, 253), (977, 677), (825, 616), (753, 545), (46, 143), (769, 581), (57, 152), (25, 163), (528, 411), (104, 175), (150, 227), (169, 240), (105, 204), (1041, 715), (236, 248), (585, 475), (622, 494), (966, 694), (569, 466), (1038, 710), (721, 527), (702, 541), (414, 373), (529, 442), (713, 550), (750, 569), (634, 504), (638, 504), (918, 638), (657, 517), (30, 163), (686, 532), (334, 303), (987, 682), (83, 193), (790, 564), (982, 677), (945, 685), (177, 218), (106, 176), (401, 368), (782, 587), (681, 532), (494, 419), (393, 364), (737, 564), (601, 457), (4, 122), (41, 171), (734, 559), (374, 350), (582, 443), (33, 139), (270, 268), (689, 508), (218, 239), (217, 241), (955, 691), (241, 253), (206, 231), (625, 471), (6, 151), (766, 577), (1022, 701), (398, 364), (118, 209), (561, 434), (841, 598), (51, 149), (910, 661), (883, 622), (830, 616), (886, 648), (993, 686), (841, 625), (105, 178), (62, 152), (45, 145), (0, 122), (801, 601), (513, 433), (985, 709), (649, 485), (929, 649), (185, 222), (54, 175), (193, 253)}
set_3 = {(1241, 689), (1084, 614), (1190, 662), (710, 455), (1155, 649), (526, 383), (1110, 626), (884, 527), (1137, 664), (1190, 685), (806, 494), (491, 372), (1279, 704), (1018, 584), (556, 419), (823, 503), (1033, 593), (540, 413), (502, 374), (1177, 659), (569, 403), (564, 398), (1097, 623), (729, 465), (1249, 693), (751, 471), (913, 542), (1001, 579), (1265, 701), (843, 535), (1081, 638), (895, 554), (541, 416), (590, 409), (377, 329), (1142, 641), (830, 503), (1246, 689), (961, 584), (979, 592), (1255, 719), (1081, 615), (610, 417), (1118, 652), (528, 411), (825, 527), (977, 569), (601, 416), (966, 584), (1209, 674), (1193, 666), (1156, 647), (1113, 630), (521, 383), (532, 410), (1124, 633), (1078, 611), (1038, 593), (1185, 685), (822, 523), (1094, 619), (601, 439), (657, 437), (1279, 719), (769, 505), (1006, 579), (833, 507), (768, 481), (942, 551), (1086, 638), (998, 598), (1214, 674), (1150, 667), (1017, 586), (829, 505), (710, 478), (1099, 646), (553, 396), (1065, 608), (998, 575), (830, 527), (478, 365), (1137, 641)}

For calculating the similarity of the sets, I use the following function:

def counter_cosine_similarity(c1, c2):
    terms = set(c1).union(c2)
    dotprod = sum(c1.get(k, 0) * c2.get(k, 0) for k in terms)
    magA = math.sqrt(sum(c1.get(k, 0)**2 for k in terms))
    magB = math.sqrt(sum(c2.get(k, 0)**2 for k in terms))
    return dotprod / (magA * magB)

And here is the similarity result:

List_A: set_0

List_B: set_1

Similarity: 44.92804769564075
-----------------
List_A: set_0

List_B: set_2

Similarity: 2.9617443887954615
-----------------
List_A: set_0

List_B: set_3

Similarity: 37.80044164146765
-----------------
List_A: set_1

List_B: set_2

Similarity: 2.370227315699886
-----------------
List_A: set_1

List_B: set_3

Similarity: 67.48293207593244
-----------------
List_A: set_2

List_B: set_3

Similarity: 1.6362689789127198
-----------------

From the result, I wish that set_0, set_1 and set_3 can be combined together and leave set_2 as an unique set, such that only 2 sets would be obtained as a final result.

May I ask is there a way to automatically do the process I think? Thanks!

UPDATE 1

I found this function from Python: simple list merging based on intersections which is useful for getting what I desired! The only difference is that I increase the number of intersections instead of using 0.

def merge(lsts):
    sts = [set(l) for l in lsts]
    i = 0
    while i < len(sts):
        j = i+1
        while j < len(sts):
            if len(sts[i].intersection(sts[j])) > 10: # Change the number of intersections
                sts[i] = sts[i].union(sts[j])
                sts.pop(j)
            else: j += 1
        i += 1
    lst = [list(s) for s in sts]
    return lst

However, when changing to set instead of using list for computation, the order of the coordinate points is messed up. Would there be another way for keeping the order while doing the same thing from the code above?

Hang
  • 197
  • 1
  • 11
  • If both (A,B) is close, and (B, C) is close, implies (A, C) is close, you should be able to easily do this using some thresholding function – IanQ Aug 19 '21 at 03:33
  • `(A,B) is close, and (B, C) is close, implies (A, C) is close` - not necessarily – S P Sharan Aug 19 '21 at 03:35
  • How do you define similar? Do you mean isolate a set if that set is 30 distance away from all other groups? What is the threshold? I need the exact rule! – wong.lok.yin Aug 19 '21 at 04:59
  • @jasonwong What I mean similar is that the sets share a lot of common coordinate points. While combining the coordinate points in set_0, set_1 and set_3, I can plot the desired contour. Thank you~ – Hang Aug 19 '21 at 05:04
  • @SPSharan Sorry, I definitely could have phrased that better. I meant that IF that was the case, then OP should be able to apply that thresholding. I didn't mean to say that it was true – IanQ Aug 19 '21 at 05:19
  • So, `set_0` is similar to both `set_1` and `set_3`, and `set_1` and `set_3` are similar too, and hence you want to combine the three of them. But suppose `set_1` and `set_3` weren't similar to each other (still being both similar to `set_0`): what would you do? Make one combination of (`set_0`, `set_1`) and another one of (`set_0`, `set_3`)? – gimix Aug 19 '21 at 09:13
  • @gimix In this case, still combine three of them as both `set_1` and `set_3` still similar to `set_0` – Hang Aug 19 '21 at 09:59
  • Sorry, but I think you definitely need to specify better: will the sets always be 4 or can be any number? What happens in the case i cited before, or if the similarities are `set_0` with `set_1` and `set_2` with `set_3`? And what are you passing to your `counter_cosine_similarity` function? Not sets, because sets don't have a `get()` method: dictionaries? Where did you get them? – gimix Aug 19 '21 at 12:39

0 Answers0