Here is my problem: I have a sequence S of (nonempty but possibly not distinct) sets s_i, and for each s_i need to know how many sets s_j in S (i ≠ j) are subsets of s_i.
I also need incremental performance: once I have all my counts, I may replace one set s_i by some subset of s_i and update the counts incrementally.
Performing all this using purely functional code would be a huge plus (I code in Scala).
As set inclusion is a partial ordering, I thought the best way to solve my problem would be to build a DAG that would represent the Hasse diagram of the sets, with edges representing inclusion, and join an integer value to each node representing the size of the sub-dag below the node plus 1. However, I have been stuck for several days trying to develop the algorithm that builds the Hasse diagram from the partial ordering (let's not talk about incrementality!), even though I thought it would be some standard undergraduate material.
Here is my data structure :
case class HNode[A] (
val v: A,
val child: List[HNode[A]]) {
val rank = 1 + child.map(_.rank).sum
}
My DAG is defined by a list of roots and some partial ordering:
class Hasse[A](val po: PartialOrdering[A], val roots: List[HNode[A]]) {
def +(v: A): Hasse[A] = new Hasse[A](po, add(v, roots))
private def collect(v: A, roots: List[HNode[A]], collected: List[HNode[A]]): List[HNode[A]] =
if (roots == Nil) collected
else {
val (subsets, remaining) = roots.partition(r => po.lteq(r.v, v))
collect(v, remaining.map(_.child).flatten, subsets.filter(r => !collected.exists(c => po.lteq(r.v, c.v))) ::: collected)
}
}
I am pretty stuck here. The last I came up to add a new value v to the DAG is:
- find all "root subsets" rs_i of v in the DAG, i.e., subsets of v such that no superset of rs_i is a subset of v. This can be done quite easily by performing a search (BFS or DFS) on the graph (
collect
function, possibly non-optimal or even flawed). - build the new node n_v, the children of which are the previously found rs_i.
- Now, let's find out where n_v should be attached: for a given list of roots, find out supersets of v. If none are found, add n_v to the roots and remove subsets of n_v from the roots. Else, perform step 3 recursively on the supersets's children.
I have not yet implemented fully this algorithm, but it seems uncessarily circonvoluted and nonoptimal for my apparently simple problem. Is there some simpler algorithm available (Google was clueless on this)?