11

I have an arbitrary set of items (dots below), and some number of categories overlapping in an arbitrary way (A-C below). The test is to determine whether it's possible for each item to be assigned to a single category, among the ones it already belongs to, such that each category ends up with at least some number of items.

So for example, we may require that A, B, and C can each claim one item. If we're given all 4 dots below, showing that this is possible is easy: just stick all the items in a list, loop through the categories, and have each category remove an item that it has access too, and as long as each category is able to remove one item, we pass the test.

Venn diagram, with three circles and colored dots in the overlapping sections

Now suppose we remove the blue dot and repeat the test. Clearly we can assign yellow to A, red to B, and green to C, and again we pass. But it's hard to codify this solution: if we follow the previous method (again no blue dot), then suppose A starts by removing the green dot. Now if B removes the yellow dot, we'll fail the test (since C can't remove the red dot), whereas if B removes the red dot, C can still take yellow and pass.

One could solve this by brute force by iterating through every possible assignment of items to categories, checking to see if the condition is met with each iteration, but this won't scale well to arbitrary numbers of items and categories, and I have a feeling there's a smarter or more efficient way.

I don't know the name of this problem, which makes it hard to research. Does it have an optimal solution? If so, what kind of complexity can I expect for the solution?

Shep
  • 7,990
  • 8
  • 49
  • 71
  • 2
    I haven't thought this through fully, but it feels like this ought to be solvable with a dynamic-programming approach, and thus greatly benefit from memoization. – Oliver Charlesworth Aug 23 '13 at 20:31
  • I find it hard to understand the formal nature of the problem. Do you mean "for each item to be assigned to a single category *among the one it belongs to*, such that each category ends up with *at least a given* number of items"? – Giulio Franco Aug 23 '13 at 20:31
  • @GiulioFranco, exactly, edited – Shep Aug 23 '13 at 20:33
  • 4
    This question appears to be off-topic because it seems more appropriate for math.stackexchange.com. – Barmar Aug 23 '13 at 20:37
  • @Barmar, I'm implementing it in C++, but it's somewhat language agnostic, so I decided to leave out the code. – Shep Aug 23 '13 at 20:40
  • If your question were "I found this algorithm to categorize items, but I can't figure out how to code it in C++", that would be appropriate for SO. But if the question is about the algorithm itself, that's pure mathematics (set theory or graph theory). – Barmar Aug 23 '13 at 20:42
  • I tend to think that algorithms are pretty central to coding, and I'd rather keep it here, but you may be right that this is better as two questions. Well, I guess we'll see how the close votes stack up. – Shep Aug 23 '13 at 20:45
  • 6
    Maybe I'm missing some rule, but I think algoritmic theory is a branch of computer science. The help center says "if your question generally covers [...] a software algorithm [...] then you’re in the right place to ask your question!" – Giulio Franco Aug 23 '13 at 21:23
  • Yeah, glancing at the math site, I don't know if this question is really appropriate there. Maybe overzealous closing... – Shep Aug 23 '13 at 21:27
  • 1
    I tried to edit this to make it seem less like a search question or a "name that tune" question. If I've made hash of it, please feel free to continue to improve it. – Todd A. Jacobs Aug 23 '13 at 22:02
  • 2
    @CodeGnome, I think those edits improve the post a lot. – Shep Aug 23 '13 at 22:07

2 Answers2

6

You were right to point out that it's an assignment problem, and as such, it can be solved using Graph Theory classic algorithms.

You can translate your problem as follows:

enter image description here

The nodes on one side represent the categories and the nodes on the other side represent the items. You want to find a maximal match. This problem can be reduced to finding maximal flow in a bi-partite graph.

Fold-Fulkerson can be used to find a maximum matching in a bipartite graph in O(ES) where E is the number of edges and S is the maximal flow in the network.

Nir Alfasi
  • 53,191
  • 11
  • 86
  • 129
3

Let D denotes the set of your dots, and C denotes the set of categories.

You can construct a bipartite graph G(D ∪ C, E), where D ∪ C is a set of vertices and E is a set of edges between D and C.

(d, c) is in E if and only if dot d belongs to category c.

Then you are interested in finding a maximal bipartite matching in this graph.

Because the graph is bipartite, it can be solved by a well known Hopcroft–Karp algorithm

If the cardinality of the computed maximal matching is equal to the size of C, then it is possible to match every category to a different dot and this matching describes this assigning.

pkacprzak
  • 5,537
  • 1
  • 17
  • 37