Given a 2D Boolean array, want to find maximal bicliques (complete bipartite graphs) – set of rows×cols having all cells set to true
.
The task is NP-complete and I managed to solve it imperfectly.
Now it gets more interesting:
want to add some "fuzziness": allow a row to be included into a set, if it only misses up to X cells. We'll consider "as if" they're "on" if that helps grow the biclique bigger. How more complex would that make the algorithm? So far I can only see that every found biclique should test all data set again for possible "fakes" that would help grow. Feels wrong. Please advice an algorithm.
the size of the data gets bigger. Maybe if you're fluent with Machine Learning techniques – please advice some "approximate" approach to the problem. With bigger datasets approximate results are sufficient. Please suggest a technique applicable to this problem.
Illustrations of the problem
Blue cells are true
, white false
. The "fuzzy" maximal biclique is framed red. Each row has received 3 "fantom" cells, colored grey. Order of rows/cols does not matter. The maximal biclique doesn't have to be a continuous rectangle, here it's only for the sake of illustrating.
In this test light-blue cells were cancelled out by an optimization phase. The rest form two bi-cliques, rows×cols: (3,5)×(1,3,4,6) and (3,4,5)×(1,3).
However, considering fuzziness=2
, row 4 can have cols (4,6) added, and rows (3,5) can have col 5 added, resulting in one bigger biclique (3,4,5)×(1,3,4,5,6).