2

I have dictionary of arrays as like:

y_dict= {1: np.array([5, 124, 169, 111, 122, 184]),
         2: np.array([1, 2, 3, 4, 5, 6, 111, 184]), 
         3: np.array([169, 5, 111, 152]), 
         4: np.array([0, 567, 5, 78, 90, 111]),
         5: np.array([]),
         6: np.array([])}

I need to find interception of arrays in my dictionary: y_dict. As a first step I cleared dictionary from empty arrays, as like

dic = {i:j for i,j in y_dict.items() if np.array(j).size != 0}

So, dic has the following view:

dic = { 1: np.array([5, 124, 169, 111, 122, 184]),
        2: np.array([1, 2, 3, 4, 5, 6, 111, 184]), 
        3: np.array([169, 5, 111, 152]), 
        4: np.array([0, 567, 5, 78, 90, 111])}

To find interception I tried to use tuple approach as like:

result_dic = list(set.intersection(*({tuple(p) for p in v} for v in dic.values())))

Actual result is empty list: [];

Expected result should be: [5, 111]

Could you please help me to find intersection of arrays in dictionary? Thanks

Jean-François Fabre
  • 137,073
  • 23
  • 153
  • 219
Cindy
  • 568
  • 7
  • 20

2 Answers2

2

The code you posted is overcomplex and wrong because there's one extra inner iteration that needs to go. You want to do:

result_dic = list(set.intersection(*(set(v) for v in dic.values())))

or with map and without a for loop:

result_dic = list(set.intersection(*(map(set,dic.values()))))

result

[5, 111]
  • iterate on the values (ignore the keys)
  • convert each numpy array to a set (converting to tuple also works, but intersection would convert those to sets anyway)
  • pass the lot to intersection with argument unpacking

We can even get rid of step 1 by creating sets on every array and filtering out the empty ones using filter:

result_dic = list(set.intersection(*(filter(None,map(set,y_dict.values())))))

That's for the sake of a one-liner, but in real life, expressions may be decomposed so they're more readable & commentable. That decomposition may also help us to avoid the crash which occurs when passed no arguments (because there were no non-empty sets) which defeats the smart way to intersect sets (first described in Best way to find the intersection of multiple sets?).

Just create the list beforehand, and call intersection only if the list is not empty. If empty, just create an empty set instead:

non_empty_sets = [set(x) for x in y_dict.values() if x.size]
result_dic = list(set.intersection(*non_empty_sets)) if non_empty_sets else set()
Jean-François Fabre
  • 137,073
  • 23
  • 153
  • 219
  • Thanks a lot for your explanation :) – Cindy Dec 10 '19 at 15:08
  • I tried to use the last one solution without mentioned step1 for deleting empty arrays from dict and received an issue `unhashable type: 'numpy.ndarray'`. I don't understand why it happens? Do you have an idea why it happens? – Cindy Dec 10 '19 at 15:19
  • I think your real data is different from the one shown. Maybe you have 2D arrays? because converting a numpy array to a set iterates on the elements of the array. If the elements are hashable (like with integers) it works, else it doesn't. – Jean-François Fabre Dec 10 '19 at 15:35
  • 1
    for `dict` with more than 20 arrays inside appears weird error: `descriptor 'intersection' of 'set' object needs an argument` :-( – Cindy Dec 10 '19 at 19:40
  • you need to create another question, and a [mcve] for this – Jean-François Fabre Dec 10 '19 at 20:04
  • @Jean-FrançoisFabre You have the same bug as https://stackoverflow.com/questions/59275902/list-comprehension-and-intersection-problem/59275948?noredirect=1#comment104758873_59275948 – wim Dec 10 '19 at 22:00
  • err, what do you mean? `>>> set.intersection({3,4},{5,4}) {4}` there's no `self` in `set.intersection` – Jean-François Fabre Dec 10 '19 at 22:05
  • Yes there is. That only works because the `{3,4}` gets sent in as "self". It will crash with an error message very similar to [the one Cindy was seeing](https://stackoverflow.com/questions/59269849/python-intersection-of-arrays-in-dictionary?noredirect=1&lq=1#comment104755618_59269927) if you try to intersect an empty collection (e.g. `set.intersection(*(set(v) for v in {}.values()))`). – wim Dec 10 '19 at 22:09
  • adapted the answer to fix this issue – Jean-François Fabre Dec 10 '19 at 22:17
  • OK, but it's just a bandaid fix and this code is still very ugly so my vote stands. – wim Dec 10 '19 at 22:56
1

You should be using numpy's intersection here, not directly in Python. And you'll need to add special handling for the empty intersection.

>>> intersection = None
>>> for a in y_dict.values(): 
...     if a.size: 
...         if intersection is None: 
...             intersection = a 
...             continue 
...         intersection = np.intersect1d(intersection, a) 
...
>>> if intersection is not None: 
...     print(intersection)
...
[  5 111]

For the case where intersection is None, it means that all of the arrays in y_dict had size zero (no elements). In this case the intersection is not well-defined, you have to decide for yourself what the code should do here - probably raise an exception, but it depends on the use-case.

wim
  • 338,267
  • 99
  • 616
  • 750