Right now, I got four different sets, which are some coordinate points, as an example as shown below. And my goal is to combine the sets based on their similarity automatically but not manually.
set_0 = {(540, 413), (30, 223), (177, 254), (40, 206), (31, 203), (1118, 652), (208, 260), (374, 350), (64, 212), (528, 411), (825, 527), (253, 304), (1086, 638), (316, 326), (84, 242), (998, 598), (0, 215), (183, 254), (57, 212), (215, 289), (245, 299), (90, 221), (966, 584), (843, 535), (1081, 638), (1137, 664), (37, 203), (126, 256), (895, 554), (541, 416), (1150, 667), (1190, 685), (0, 193), (176, 251), (710, 478), (73, 241), (169, 251), (1099, 646), (6, 193), (532, 410), (65, 215), (193, 260), (118, 230), (201, 286), (961, 584), (1185, 685), (979, 592), (47, 206), (822, 523), (1255, 719), (225, 295), (556, 419), (206, 263), (601, 439), (830, 527), (769, 505), (111, 230)}
set_1 = {(1241, 689), (1084, 614), (1190, 662), (177, 254), (710, 455), (253, 304), (316, 326), (1155, 649), (84, 242), (526, 383), (1110, 626), (0, 215), (884, 527), (183, 254), (57, 212), (126, 256), (806, 494), (491, 372), (1279, 704), (1018, 584), (823, 503), (1033, 593), (502, 374), (1177, 659), (569, 403), (564, 398), (208, 260), (1097, 623), (64, 212), (1249, 693), (729, 465), (751, 471), (913, 542), (1001, 579), (1265, 701), (590, 409), (377, 329), (176, 251), (6, 193), (1142, 641), (830, 503), (1246, 689), (1255, 719), (1081, 615), (30, 223), (610, 417), (977, 569), (215, 289), (245, 299), (90, 221), (601, 416), (1209, 674), (37, 203), (1193, 666), (1156, 647), (1113, 630), (521, 383), (169, 251), (65, 215), (1124, 633), (1078, 611), (1038, 593), (201, 286), (1094, 619), (206, 263), (657, 437), (1279, 719), (1006, 579), (111, 230), (40, 206), (31, 203), (374, 350), (833, 507), (768, 481), (942, 551), (1214, 674), (1017, 586), (829, 505), (0, 193), (73, 241), (553, 396), (193, 260), (1065, 608), (118, 230), (47, 206), (998, 575), (225, 295), (478, 365), (1137, 641)}
set_2 = {(929, 676), (793, 569), (529, 416), (377, 355), (588, 447), (726, 527), (889, 652), (926, 671), (718, 523), (904, 631), (534, 442), (718, 550), (995, 715), (870, 638), (582, 470), (0, 151), (976, 674), (662, 517), (152, 202), (1005, 719), (665, 522), (950, 685), (961, 667), (182, 218), (1033, 710), (956, 689), (913, 638), (697, 541), (942, 653), (753, 573), (905, 634), (154, 232), (65, 184), (1028, 704), (1027, 706), (830, 588), (670, 523), (643, 481), (1050, 719), (988, 680), (337, 307), (785, 592), (497, 424), (39, 140), (153, 205), (208, 260), (971, 700), (644, 479), (54, 147), (119, 212), (630, 471), (30, 134), (622, 467), (433, 387), (585, 448), (201, 257), (166, 235), (1016, 698), (729, 559), (17, 130), (846, 625), (273, 272), (873, 643), (729, 531), (174, 214), (857, 634), (121, 187), (377, 329), (806, 601), (518, 433), (990, 709), (790, 592), (502, 424), (654, 485), (89, 170), (61, 154), (25, 134), (18, 128), (70, 184), (934, 676), (798, 569), (190, 249), (534, 416), (382, 355), (809, 578), (870, 612), (1017, 701), (94, 196), (894, 652), (913, 665), (742, 536), (206, 257), (905, 661), (198, 253), (977, 677), (825, 616), (753, 545), (46, 143), (769, 581), (57, 152), (25, 163), (528, 411), (104, 175), (150, 227), (169, 240), (105, 204), (1041, 715), (236, 248), (585, 475), (622, 494), (966, 694), (569, 466), (1038, 710), (721, 527), (702, 541), (414, 373), (529, 442), (713, 550), (750, 569), (634, 504), (638, 504), (918, 638), (657, 517), (30, 163), (686, 532), (334, 303), (987, 682), (83, 193), (790, 564), (982, 677), (945, 685), (177, 218), (106, 176), (401, 368), (782, 587), (681, 532), (494, 419), (393, 364), (737, 564), (601, 457), (4, 122), (41, 171), (734, 559), (374, 350), (582, 443), (33, 139), (270, 268), (689, 508), (218, 239), (217, 241), (955, 691), (241, 253), (206, 231), (625, 471), (6, 151), (766, 577), (1022, 701), (398, 364), (118, 209), (561, 434), (841, 598), (51, 149), (910, 661), (883, 622), (830, 616), (886, 648), (993, 686), (841, 625), (105, 178), (62, 152), (45, 145), (0, 122), (801, 601), (513, 433), (985, 709), (649, 485), (929, 649), (185, 222), (54, 175), (193, 253)}
set_3 = {(1241, 689), (1084, 614), (1190, 662), (710, 455), (1155, 649), (526, 383), (1110, 626), (884, 527), (1137, 664), (1190, 685), (806, 494), (491, 372), (1279, 704), (1018, 584), (556, 419), (823, 503), (1033, 593), (540, 413), (502, 374), (1177, 659), (569, 403), (564, 398), (1097, 623), (729, 465), (1249, 693), (751, 471), (913, 542), (1001, 579), (1265, 701), (843, 535), (1081, 638), (895, 554), (541, 416), (590, 409), (377, 329), (1142, 641), (830, 503), (1246, 689), (961, 584), (979, 592), (1255, 719), (1081, 615), (610, 417), (1118, 652), (528, 411), (825, 527), (977, 569), (601, 416), (966, 584), (1209, 674), (1193, 666), (1156, 647), (1113, 630), (521, 383), (532, 410), (1124, 633), (1078, 611), (1038, 593), (1185, 685), (822, 523), (1094, 619), (601, 439), (657, 437), (1279, 719), (769, 505), (1006, 579), (833, 507), (768, 481), (942, 551), (1086, 638), (998, 598), (1214, 674), (1150, 667), (1017, 586), (829, 505), (710, 478), (1099, 646), (553, 396), (1065, 608), (998, 575), (830, 527), (478, 365), (1137, 641)}
For calculating the similarity of the sets, I use the following function:
def counter_cosine_similarity(c1, c2):
terms = set(c1).union(c2)
dotprod = sum(c1.get(k, 0) * c2.get(k, 0) for k in terms)
magA = math.sqrt(sum(c1.get(k, 0)**2 for k in terms))
magB = math.sqrt(sum(c2.get(k, 0)**2 for k in terms))
return dotprod / (magA * magB)
And here is the similarity result:
List_A: set_0
List_B: set_1
Similarity: 44.92804769564075
-----------------
List_A: set_0
List_B: set_2
Similarity: 2.9617443887954615
-----------------
List_A: set_0
List_B: set_3
Similarity: 37.80044164146765
-----------------
List_A: set_1
List_B: set_2
Similarity: 2.370227315699886
-----------------
List_A: set_1
List_B: set_3
Similarity: 67.48293207593244
-----------------
List_A: set_2
List_B: set_3
Similarity: 1.6362689789127198
-----------------
From the result, I wish that set_0, set_1 and set_3 can be combined together and leave set_2 as an unique set, such that only 2 sets would be obtained as a final result.
May I ask is there a way to automatically do the process I think? Thanks!
UPDATE 1
I found this function from Python: simple list merging based on intersections which is useful for getting what I desired! The only difference is that I increase the number of intersections instead of using 0.
def merge(lsts):
sts = [set(l) for l in lsts]
i = 0
while i < len(sts):
j = i+1
while j < len(sts):
if len(sts[i].intersection(sts[j])) > 10: # Change the number of intersections
sts[i] = sts[i].union(sts[j])
sts.pop(j)
else: j += 1
i += 1
lst = [list(s) for s in sts]
return lst
However, when changing to set instead of using list for computation, the order of the coordinate points is messed up. Would there be another way for keeping the order while doing the same thing from the code above?