2

there is an algorithmic problem. I have several subsets of one super set (there can be n-th number of subsets, depending on the length of the super set). Here is an example:

['/img0.jpg ']
['/img1.jpg']
['/img10.jpg']
['/img11.jpg' '/img9.jpg']
['/img12.jpg' '/img15.jpg']
['/img11.jpg' '/img13.jpg' '/img8.jpg' '/img9.jpg']
['/img14.jpg']
['/img15.jpg']
['/img16.jpg']
['/img14.jpg' '/img17.jpg']
['/img18.jpg']
['/img19.jpg' '/img22.jpg']
['/img2.jpg']
['/img19.jpg' '/img20.jpg' '/img21.jpg' '/img22.jpg']
['/img19.jpg' '/img21.jpg' '/img22.jpg']
['/img22.jpg']
['/img23.jpg']
['/img3.jpg']
['/img3.jpg' '/img4.jpg']
['/img0.jpg' '/img5.jpg']
['/img6.jpg']
['/img2.jpg' '/img7.jpg']
['/img11.jpg' '/img8.jpg' '/img9.jpg']
['/img9.jpg']

As a result , you should get:

['/img1.jpg']
['/img10.jpg']
['/img11.jpg' '/img13.jpg' '/img8.jpg' '/img9.jpg']
['/img12.jpg' '/img15.jpg']
['/img14.jpg' '/img17.jpg']
['/img16.jpg']
['/img18.jpg']
['/img19.jpg' '/img20.jpg' '/img21.jpg' '/img22.jpg']
['/img2.jpg' '/img7.jpg']
['/img23.jpg']
['/img3.jpg' '/img4.jpg']
['/img0.jpg' '/img5.jpg']
['/img6.jpg']

Simply put, the task boils down to finding a super set, bypassing all subsets

Please tell me what are the ways and ideas for writing an algorithm?Can be solved without sets optimally?

I have the following code:

dict_dir = {}

compare_keys = compare.keys() #dictionary with keys as input (n-th number of sets with super sets)

for idx, key1 in enumerate(compare_keys):
    array = []
    for key2 in compare_keys:
        if set(compare[key1]).issubset(set(compare[key2])):
            array.extend(list(compare[key2]))
            
    arr = list(set(array))
    
    dict_dir[f'dir{idx}'] = arr

The following result is obtained:

[['/img0.jpg', '/img5.jpg'],
 ['/img1.jpg'],
 ['/img10.jpg'],
 ['/img11.jpg', '/img13.jpg', '/img8.jpg', '/img9.jpg'],
 ['/img12.jpg', '/img15.jpg'],
 ['/img11.jpg', '/img13.jpg', '/img8.jpg', '/img9.jpg'],
 ['/img14.jpg', '/img17.jpg'],
 ['/img12.jpg', '/img15.jpg'],
 ['/img16.jpg'],
 ['/img14.jpg', '/img17.jpg'],
 ['/img18.jpg'],
 ['/img19.jpg', '/img20.jpg', '/img21.jpg', '/img22.jpg'],
 ['/img2.jpg', '/img7.jpg'],
 ['/img19.jpg', '/img20.jpg', '/img21.jpg', '/img22.jpg'],
 ['/img19.jpg', '/img20.jpg', '/img21.jpg', '/img22.jpg'],
 ['/img19.jpg', '/img20.jpg', '/img21.jpg', '/img22.jpg'],
 ['/img23.jpg'],
 ['/img3.jpg', '/img4.jpg'],
 ['/img3.jpg', '/img4.jpg'],
 ['/img0.jpg', '/img5.jpg'],
 ['/img6.jpg'],
 ['/img2.jpg', '/img7.jpg'],
 ['/img11.jpg', '/img13.jpg', '/img8.jpg', '/img9.jpg'],
 ['/img11.jpg', '/img13.jpg', '/img8.jpg', '/img9.jpg']]

It works a little incorrectly and not optimally, I would like the work of such an algorithm to be minimally spent on memory and time.

cafce25
  • 15,907
  • 4
  • 25
  • 31
Nikita
  • 21
  • 1
  • What is your expected output if there are partially-overlapping sets in the input (e.g. `['A', 'B']` and `['B','C']`)? – motto Apr 15 '23 at 07:56
  • In this case, it is necessary to leave both – Nikita Apr 15 '23 at 07:59
  • 1
    Oh? That is not what you did with `img19`... You still merged it instead of leaving the different subsets that have a partial overlap at `img19`. – trincot Apr 15 '23 at 08:03
  • This feels a bit like the [set cover problem](https://en.wikipedia.org/wiki/Set_cover_problem), of which there's a lot of good discussion on [this previous question](https://stackoverflow.com/a/21974512/21146235). I may be misunderstanding though. – motto Apr 15 '23 at 08:05
  • Perhaps I did not notice the error, so far it can be assumed that there will be no such intersecting sets. – Nikita Apr 15 '23 at 08:11
  • Perhaps the tasks are partially similar, but still my main task is to find a set from all subsets, from all the abundance of subsets of this and other sets – Nikita Apr 15 '23 at 08:31

1 Answers1

1

Given your stipulation that there will be no partially-intersecting sets, the algorithm is pretty straightforward. We can just pick the largest set that contains each element.

Note that this also works under the weaker assumption that even if there are partially-intersecting sets then there's a larger set that contains both of those, but your sample data doesn't seem to need that. It does not guarantee an optimal result in the general case outlined in the set cover problem, but it won't break the algorithm.

def find_cover(candidates):
    """
    Filter a list of lists of elements, such that the output
    contains all elements in the input.
    """

    # All elements must appear in the output somewhere
    remaining_elements = set(sum(candidates, []))

    cover = []

    # Search from largest to smallest --
    # if all supersets are disjoint then there this is optimal,
    # but even if there's overlap then the greedy strategy produces
    # decent results in the general case.

    for candidate in sorted(candidates, key=len, reverse=True):
        if not remaining_elements.isdisjoint(candidate):
            # If this set contains one or more candidates then
            # include it in the result
            cover.append(candidate)
            remaining_elements.difference_update(candidate)

            if not remaining_elements:
                # All elements are now accounted for, we can go home early
                break

    return cover

Output for your sample input:

[['/img11.jpg', '/img13.jpg', '/img8.jpg', '/img9.jpg'],
['/img19.jpg', '/img20.jpg', '/img21.jpg', '/img22.jpg'],
['/img12.jpg', '/img15.jpg'],
['/img14.jpg', '/img17.jpg'],
['/img3.jpg', '/img4.jpg'],
['/img0.jpg', '/img5.jpg'],
['/img2.jpg', '/img7.jpg'],
['/img1.jpg'],
['/img10.jpg'],
['/img16.jpg'],
['/img18.jpg'],
['/img23.jpg'],
['/img6.jpg']]
motto
  • 2,888
  • 2
  • 2
  • 14
  • You probably could optimize that by just checking if the first element of the current set is contained in the previous set. That will be true iff the current set is a subset of the previous set. – Juan Lopes Apr 15 '23 at 12:56
  • @JuanLopes yes, absolutely! Using the intersection keeps the method robust against a greater variety of inputs, but for well-formed inputs it would certainly suffice to just check if the first (or any) element in each `candidate` is in `remaining_elements`. – motto Apr 15 '23 at 13:00
  • please tell me what to do if the sets intersect? – Nikita May 09 '23 at 13:38
  • In the case where the sets intersect you have a [set cover problem](https://en.wikipedia.org/wiki/Set_cover_problem), which is discussed in a [previously accepted answer](https://stackoverflow.com/a/21974512/21146235). – motto May 09 '23 at 16:31