While working on a feature selector for a machine learning algorithm in Python, I generated a data structure with the following code:
# Perform set partitioning on the results
groups = []
for t in results:
(jthName,kthName) = t
jthGroup = -1
kthGroup = -1
# Just a simple list of hashes with online merging
for idx,group in enumerate(groups):
if jthName in group:
jthGroup = idx
if kthName in group:
kthGroup = idx
if jthGroup == kthGroup:
if jthGroup == -1: # Implicit: "and kthGroup == -1"
groups.append(set((jthName,kthName)))
elif jthGroup != kthGroup:
if kthGroup == -1:
# Merge kthName into jthGroup
groups[jthGroup].add(kthName)
elif jthGroup == -1:
# Merge jthName into kthGroup (redundant if naturally-ordered)
groups[kthGroup].add(jthName)
else:
# Merge jthGroup and kthGroup, since we have a connecting pair
merged = set()
merged.update(groups[jthGroup])
merged.update(groups[kthGroup])
groups.remove(groups[jthGroup])
groups.remove(groups[kthGroup])
groups.append(merged)
Where my input, results
, is a list of tuple{2} and groups
is a list of set. Note that my code isn't necessarily efficient here; it's provided only for illustrative purposes.
My data structure, groups
, has the following properties:
For each
(jthName,kthName)
:- If neither element of
(jthName,kthName)
is found in any contained set, createset((jthName,kthName))
within our list of sets. - If exactly one of
(jthName,kthName)
is found in one contained set, coalesce the unfound element into that set. - If each element of
(jthName,kthName)
is found in a different set, coalesce the two referenced sets into a single set.
- If neither element of
Loop invariant:
jthName
andkthName
cannot be contained in more than one set.
My justification for this data structure is to create a flat decomposition of an unknown set of connected nodegraphs, where each unique element name is a node and each unique pair is an edge. My rationale is that my graphs are incomplete, and I require this view to select only the known members of each graph to feed into an algorithm that will regressively determine graph connectivity and directionality of the edges (that is, the complete set of DAGs expressed by the data). But, I digress.
Is there a friendly name for the data structure represented by variable groups
? If so or if not, is there a more time- or space-efficient means of performing this decomposition?