2

I have a few lists

A =['D','KIT','SAP'],
B= ['F','G','LUFT','SAP'],
C= ['I','SAP'],
D= ['SAP','LUF','KIT'],
F= ['SAP','LUF','KIT','HASS']

I passed them in combinations to a dictionary. For example, I joined lists A and B as a key 'AB' in my new dictionary and appended the values from both lists like the list of values to that key 'AB'.

my_dict={"AB":[['D','KIT','SAP'],['F','G','LUFT','SAP']]}

My aim to get the overlap between the lists in multiple combination. For that, I could get with the following one liner.

   for k,v in my_dict.items():
        print(k,list(set.intersection(*[set(x) for x in v])))

AB ['SAP']

However, the problem here is that when I have more lists and more combinations it is tedious to define manually a dictionary with all possible combinations.

The aim is to achieve all possible combinations of lists. From the above-mentioned example, I have the following combination of lists. In this case that would be 5*5 +1 = 26 combinations. And in the end, I would like to get the overlap for each of the following combinations.

   AB, AC, AD, AF
    ABC, ABD, ABF
    ABCD, ABCDF
    ABCDF
BC, BD, BF
DF, DC
FC

The question is how could I achieve that in a pythonic way. Any suggestions are much appreciated.

As a side note, I do not want a dictionary in between if this combination of overlap is possible from the lists itself.

Thanks

ARJ
  • 2,021
  • 4
  • 27
  • 52
  • That list doesn't contain all combinations. What about `BC, BD..` ETC? – sagi Feb 10 '22 at 12:44
  • Yes, you are right. I just updated the question with all possible combinations. thanks for pointing that ou! – ARJ Feb 10 '22 at 12:48
  • Just to make sure; you have a list of lists, and you want the items that appears in both list? I.e for `DF` you would have `['SAP','LUF','KIT'] ` returned, since it's the items appearing in `D` and `F` – CutePoison Feb 10 '22 at 12:57
  • Yes, but in combinations. Not just a list of lists and get one overlap. In the end I need to get overlap for each combinations I mentioned in the question – ARJ Feb 10 '22 at 12:59
  • But is the question regarding creating the combinations or getting the overlapped items? And what when you have 3 lists like `[["foo","bar","baz"],["foo"],["foo","qux","dal"]]` should that be `["foo","bar"]` (since it appears in more than 1) or only `"foo"` since it appears in all? – CutePoison Feb 10 '22 at 13:00
  • Getting the overlaps from all combinations. That is my question to be precise. Meaning that, I need the output for, AB, AC, AD, AF ABC, ABD, ABF ABCD, ABCDF ABCDF BC, BD, BF DF, DC FC – ARJ Feb 10 '22 at 13:02
  • Yes, but is it *only* items that appears in *all* lists in a combination, or just items that overlap in some lists? say `A=["foo","bar","baz"]`,`B = ["foo","qux"]`, `C=["dal","bar"]` is `ABC` then returning `["foo"]` (it is in all of them) or `["foo","bar"]` (`bar` is in `A` and `C` i.e an overlap) – CutePoison Feb 10 '22 at 13:07

5 Answers5

3

Have a look at itertools.combinations. It returns all possible combinations of a given length for a given iterable. You'll have to loop over all possible lengths.

import itertools

lists = [A, B, C, D, E, F]

combinations = []

for l in range(2, len(lists) + 1):
    combinations += itertools.combinations(lists, l)

combinations will be a list of tuples with all possible combinations of at least two lists.

1

[Edited solution after exchange in comments]

import itertools as it

A =['D','KIT','SAP'],
B= ['F','G','LUFT','SAP'],
C= ['I','SAP'],
D= ['SAP','LUF','KIT'],
F= ['SAP','LUF','KIT','HASS']

master_dict = dict(A=A, B=B, C=C, D=D, F=F)
for i in range(len(master_dict)):
    for combo in it.combinations(master_dict.keys(), i):
        new_key = "".join(combo)
        print(combo)
        if (len(combo)>1):
            mixed_dict[new_key] = set.intersection(*[set(master_dict[c][0]) for c in combo])
            print("---->" + str(mixed_dict[new_key]))

... with a couple of commented out print statements to show you what's going on. At the end of the script, you have a new_dict which looks like this:

enter image description here

I didn't add the combinations of length 1 on purpose, but you can remove the if() condition above to get them back.

butterflyknife
  • 1,438
  • 8
  • 17
  • Thanks for the solution. The reason to use the magic keys is that wanted to know from does the overlap is coming. Meaning, if the overlap is from all lists I know it is from ABCDF. however, if the overlap is only between AB. There, I need a key indicator, telling me from which lists I am getting this overlapped list. Additionally, it is a scalable solution when you have multiple lists. – ARJ Feb 10 '22 at 13:14
  • The solution is not generating the overlaps as well as the combinations correctly. For an input of 5 lists. One would expect 5*5 +1 combinations. Total 26 combinations. The current combination of output is larger than that. – ARJ Feb 10 '22 at 13:17
  • I don't think you're counting the combinations correctly. For example, you don't have ACD in your list. – butterflyknife Feb 10 '22 at 13:25
  • Thanks, yes, it is not in the question. You are right. The point is that we need 26 combinations. Which I think is not there in your solution. There is only 25 combination. For example, ABCDF is missing. – ARJ Feb 10 '22 at 13:28
1

Here is the dictionary creation with the answer of @Marco Breemhaar implemented, and I used his solution to create a dictionary as you wanted. Here is the code:

import itertools

import string
alphabet_string = string.ascii_uppercase
alphabet_list = list(alphabet_string)


A =['D','KIT','SAP']
B= ['F','G','LUFT','SAP']
C= ['I','SAP']
D= ['SAP','LUF','KIT']
E= ['SAP','LUF','KIT','HASS']

lists = [A, B, C, D, E]

combinations = []

for l in range(2, len(lists) + 1):
    combinations += itertools.combinations(lists, l)

alp_comb = []
for l in range(2, len(alphabet_list) + 1):
    alp_comb += itertools.combinations(alphabet_list[:len(lists)], l)

key_list = []
for alp_elem in alp_comb:
    key_name = ""
    for elem in alp_elem:
        key_name += elem
    key_list.append(key_name)
    key_name = None


res_dict = dict()
for key, elem in zip(key_list, combinations):
    res_dict[key] = list(elem)
print(res_dict)
1

To start solving the problem, I would turn the lists given to you into a dictionary. This is to easily reference the name of the list and the value with it.

Like what sagi commented, the combinations you listed does not give all combinations. So, I used the package itertools and the code from, https://stackoverflow.com/a/5898031/5131550, to get all possible combinations and slightly modified it to only give the results that are greater than one list long.

Then we need to add these to my_dict. I used a code snippet from geeksforgeeks, https://www.geeksforgeeks.org/python-program-to-convert-a-tuple-to-a-string/, to turn the tuple into a string that we use as a key for my_dict. At the same time we are building a list to add to those afformentioned keys.

Then we just use your function to find the interlapping elements. We should expect there are 2^(len(listsdict)) - len(listsdict) - 1, because we exclude no elements and only one element. Hope this solves it.

from itertools import combinations

#turn list into a dictionary of lists
listsdict = {
    'A' : ['D','KIT','SAP'],
    'B' : ['F','G','LUFT','SAP'],
    'C' : ['I','SAP'],
    'D' : ['SAP','LUF','KIT'],
    'F' : ['SAP','LUF','KIT','HASS']
}

my_dict = {}

#from https://www.geeksforgeeks.org/python-program-to-convert-a-tuple-to-a-string/
def convertTuple(tup):
    str = ''
    list = []
    for item in tup:
        str = str + item
        list += [listsdict[item]]
    my_dict[str] = list
    return str

#from https://stackoverflow.com/a/5898031/5131550
for L in range(0, len(listsdict)+1):
    for subset in combinations(listsdict, L):
        if len(subset) > 1:
            convertTuple(subset)

#OP's functions
for k,v in my_dict.items():
    print(list(set.intersection(*[set(x) for x in v])))
  • I get this error, at from https://stackoverflow.com/a/5898031/5131550 TypeError: 'list' object is not callable – ARJ Feb 10 '22 at 14:01
0

Thanks for all solutions. Each solution was helpful. However, none of them was the exact answer I was looking for. Nonetheless, all of them partly helped me to come up with the right solution. Posting for everyone might be helpful.

I first needed to create 26 combinations for 5 lists. The initial lists

A =['D','KIT','SAP']
B= ['F','G','LUFT','SAP']
C= ['I','SAP']
D= ['SAP','LUF','KIT']
E= ['SAP','LUF','KIT','HASS']

Made a list of lists

lists = [A, B, C, D, E]

Defined empty list called combination as @marco's solution. And added all possible combinations based on the length of lists, plus one. The combinations are the values from our final dictionary.

 combinations =[]
    for l in range(2, len(lists) + 1):
        combinations += itertools.combinations(lists, l)

After that define a dictionary with keys and values from the list of the list. From here I adopted partly @butterflyknife's solution.

master_dict = dict(A=A, B=B, C=C, D=D, F=F)

For the keys of our final dictionary, we defined an empty list first. Then populated the combinations by using join.

key_list    = []
for i in range(2,len(master_dict) +1):
    for combo in it.combinations(master_dict.keys(), i):
        new_key = "".join(combo)
        if (len(combo)>1):
            key_list.append(new_key)

Now, with both keys and values. We populate our final dictionary.

final_dict = dict()
for key, elem in zip(key_list, combinations):
    final_dict[key] = list(elem)
   

This part is to find the overlaps and convert them to a data frame.

overlap_list  = []
dfs = []
for k,v in final_dict.items():
    overlap = list(set.intersection(*[set(x) for x in v]))
    overlap_list.append(overlap)
     #print(overlap, sep=", ")
    dct  ={ 'overlap':overlap,'number of overlapps':len(overlap),
              'overlap models':k}
    dfs.append(dct)
df          = pd.DataFrame.from_dict(dfs)
ARJ
  • 2,021
  • 4
  • 27
  • 52