0

This is probably a question very similar to other ones asked previously, but I can't find the solution to my specific case.

I have lists of items. EDIT I'll show 4 lists, but the real problem could have thousands of lists, say 3000:

List 1: Items A, B, C, D

List 2: Items C, D, E, F

List 3: items A, D, F

List 4: Items B, C, D, F

I want to group the lists so that the resulting groups have the minimum sum of unique items.

Example:

List (1 + 2): A, B, C, D, E, F = 6 items

List (3 + 4): A, B, C, D, F = 5 items

Total = 6 + 5 = 11 items

Alternative solution

List (1 + 4): A, B, C, D, F = 5 items

List (2 + 3): A, C, D, E, F = 5 items

Total: 5 + 5 = 10 items, which would be a better solution than the first one. EDIT The number of groups should be editable in the code/by user input and it can range from 1 to 10.

The purpose is to create scenarios:

  • 1 group --> all the 3.000 lists are merged together, the result is a simple count of unique items
  • 2 groups --> 1.500 lists per group (approximately, it doesn't have to be exact), the purpose is to find the smart grouping to minimize the sum of unique items of the 2 groups
  • 3 groups --> ...
  • 4 groups --> ...
  • 3.000 groups --> each list is a group, where the result is the sum of the distint count of items in each list

Thank you very much.

Steff92
  • 11
  • 1
  • Does this answer your question? [Get only unique elements from two lists](https://stackoverflow.com/questions/28444561/get-only-unique-elements-from-two-lists) –  Jul 21 '21 at 11:24
  • Forgot to mention: I want to address the problem with Python. – Steff92 Jul 21 '21 at 11:25
  • What have you tried so far? – IoaTzimas Jul 21 '21 at 11:26
  • @Sujay that is not exactly what OP wants – gold_cy Jul 21 '21 at 11:27
  • 1
    This question has a straightforward answer which I was in the process of writing when it was closed. It should not have been closed. – kaya3 Jul 21 '21 at 11:29
  • ```len(list(set(List1+List2)))```? –  Jul 21 '21 at 11:29
  • @kaya3 I agree, I voted to reopen, power users on this site are to quick to rush to vote to close questions. – gold_cy Jul 21 '21 at 11:29
  • 1
    @Sujay I think you have not actually understood this question. The problem is not how to find the number of unique elements across two lists. – kaya3 Jul 21 '21 at 11:29
  • @kaya3 voted to reopen –  Jul 21 '21 at 11:32
  • I am starting with Python and I have done simple exercises, for this one I wouldn't know where to start, I expect there will be libraries that will do the job or pretty much of it. The purpose is to group intelligently the lists, so that the amount of unique items among the different groups is minimized – Steff92 Jul 21 '21 at 11:32
  • 1
    What about to figure up all combinations and take the best? What is wrong with this approach? Are you looking for a heuristic function to discard some combinations? – dani herrera Jul 21 '21 at 11:36
  • Your question needs clarification, more details about how many lists do you take, if do you need 2 groups or groups of 2 lists, etc. At this time, the solution is: `print("List (1 + 4) ,List (2 + 3)")` – dani herrera Jul 21 '21 at 11:39
  • Are there any strict time constraints? why not brute force? a = itertools.combinations(list_of_all_lists, 2) and len(list(set(a[0]+a[1]))) and appending it to the dictionary and grabbing the min – darth baba Jul 21 '21 at 11:40
  • 1
    @daniherrera There are `(n-1)!!` perfect matchings of a complete graph - where `!!` is the [double factorial](https://en.wikipedia.org/wiki/Double_factorial) - so the brute force approach is only going to work for very small `n`. – kaya3 Jul 21 '21 at 11:41
  • @kaya3, How do you know n can be different from 4? – dani herrera Jul 21 '21 at 11:42
  • 1
    @daniherrera By applying common sense. – kaya3 Jul 21 '21 at 11:42
  • I think the OP is a new python programmer and needs a solution for this particular question with n = 4 – darth baba Jul 21 '21 at 11:44
  • I can have, say, 3.000 lists of different lengths (minimum 1, maximum N items in each list) The grouping should be done in such a way that the set of lists have approximately the same length. I also need to specify the desired amount of sets of lists (can be anything between 1 and 10, for example). I will edit the post to clarify – Steff92 Jul 21 '21 at 11:44
  • @Steff92, do you want to create 2 groups or 1500 groups of 2 lists? Because on your sample do you have 4 lists and 2 groups, is there a way to know the answer reading your question? – dani herrera Jul 21 '21 at 11:46
  • @daniherrera As a new Python programmer it is kind of hard to know which info is relevant to summarize. I appreciate the support and the critics anyway. – Steff92 Jul 21 '21 at 11:56
  • 1
    Sounds like this is not really a python question but maybe a general algorithm question? – Almog-at-Nailo Jul 21 '21 at 11:57
  • As @AlmogAtNailo says, this is not about python. – dani herrera Jul 21 '21 at 11:57
  • Post edited to clarify. @AlmogAtNailo I am struggling to implement this in Python, I was hoping some library could do the job. – Steff92 Jul 21 '21 at 12:01
  • 1
    If you have a solution and only need help with the implementation, you can post it here and we'll try and help you improve it. I'm doubtful that there is a library somewhere that does exactly what you want but maybe parts of it could be solved in a different way that will help you. – Almog-at-Nailo Jul 21 '21 at 12:06

1 Answers1

1

This can be modelled as a maximum weight matching problem. Consider a graph in which your lists are vertices, and there is an edge between any two vertices whose weight is some large enough constant minus the number of distinct elements in the union of the two lists. (The large constant is so that the weights aren't negative, and so that a maximum-weight matching necessarily includes all vertices.)

The problem is then to find a set of edges not sharing any vertices, whose sum of weights is maximised. This can be achieved in O(n4) time using the blossom algorithm.

The problem is also known as the "minimum-weight perfect matching" problem, so you may find more information about other approaches by searching for that term.

kaya3
  • 47,440
  • 4
  • 68
  • 97