Generate unique lists from other

Question

I have the following code that generates unique lists from other lists:

import itertools

def print_results_ordered_in_list(inputx):
    io = 0
    input_list = list(inputx)
    while io < (len(input_list)):
        input_list[io] = list(input_list[io])
        input_list[io] = sorted(input_list[io], key=int)
        print(input_list[io])
        io += 1

list1 = ['01', '02', '03']
list2 = ['02', '03', '04']
list3 = ['04', '05', '03']
list4 = ['05', '07', '06']
list5 = ['08', '06', '07']

listas = [list1, list2, list3, list4, list5]
all_results = set(tuple(set(i)) for i in itertools.product(*listas) if len(set(i)) == 5)

print_results_ordered_in_list(all_results)

This generates 42 different lists, among them:

['01', '03', '05', '06', '07']
['02', '03', '04', '05', '06']
['01', '03', '04', '05', '07']
['01', '03', '04', '05', '07']
['02', '03', '04', '05', '06']
['03', '04', '05', '06', '08']
['02', '03', '04', '06', '08']
['01', '02', '04', '07', '08']
['01', '04', '05', '06', '08']
    ...others lists below...

It prints on the screen all generated lists. But most of the time, the difference from one generated list to another is only in 1 element (in this case, number), example:

['01', '02', '03', '05', '06']
['01', '02', '03', '05', '07']
['01', '02', '03', '05', '08']

That is, only the last element has changed.

How do I make the difference be at least 2 elements in all generated lists? We know that for one result to be different from another it is necessary that only 1 element change and this my code already does, but how to make it to be 2? Example:

['01', '02', '03', '05', '06']
['01', '02', '03', '07', '08']
['02', '03', '04', '05', '07']

We can see that in addition to being single lists, there are at least 2 distinct elements from one list to another. That's what I want to do.

possibly something similar to : https://stackoverflow.com/questions/3462143/get-difference-between-two-lists using set differences/subtraction? — jmunsch, Apr 20 '18 at 17:57
didn't you post a very similar question earlier in the day? which got downvoted? — Jean-François Fabre, Apr 20 '18 at 18:14
@jmunsch The question you sent me is similar, but I do not know if it would solve my problem. I can not think of the logic to solve this. — Geek, Apr 20 '18 at 18:20
@Jean-FrançoisFabre Yes, but I had not expressed myself well. — Geek, Apr 20 '18 at 18:22
How do you which lists should compare differences? For instance, in your last example, the first list is `['01', '02', '03', '05', '06']`, but there is no reason it could start with `['01', '02', '03', '05', '07']`, which produces different results. — pylang, Apr 21 '18 at 02:32
@pylang If you observe, each list differs 2 elements from the others. You must be thinking about the items in the lists, but I'm talking about the list itself altogether. It's like a list being unique 2 times, because instead of just changing 1 element from one list to another, they it must have at least 2 different elements from all lists. Excuse my English, maybe it's a little bad — Geek, Apr 21 '18 at 12:14
Notice the first list, it has the following:` `['01', '02', '03', '05', '06']` Now look at list 2, it has the following: `['01', '02', '03', '07', '08']` And the third list has: `['02', '03', '04', '05', '07']` In other words, the last two elements of list 2 do not exist in the first. Now see the third list comparing with the first one: the elements '04' and '07' do not exist in the first. And so also the elements '04' and '05' that are in the third list are not in the second list. That is, there are at least 2 elements of difference from one list to another. Understood? — Geek, Apr 21 '18 at 12:26
Yes I see. So each list when compared to **any** other list must differ by two items. That is a thorough response thank you. — pylang, Apr 21 '18 at 15:49
@pylang I do not understand what you mean I need it to be at least 2 different as I explained, or if possible more than that. The results were used for analysis. — Geek, Apr 24 '18 at 00:30

score 0 · Answer 1 · answered May 03 '18 at 19:14

There must be a better way to solve this, but here is a naive attempt at resolving your question(s). This solution presents results you may not realize.

We build a dictionary of tuples with some relationship with 1) the key or 2) the key and each other in a set of tuples.

Given

import copy
import itertools as it
import collections as ct

import nose.tools as nt


a = ["01", "02", "03"]
b = ["02", "03", "04"]
c = ["04", "05", "03"]
d = ["05", "07", "06"]
e = ["08", "06", "07"]


# Helpers
def _is_quasi_unique(by=3, item=None, seen=None):
    """Return True if `item` differs with all `seen` items `by` some number of elements."""
    if seen is None:
        seen = set()
    return all((len(set(s) - set(item)) >= by) for s in seen)


def _has_repeats(iterable):
    """Return True if repeated elements are in `iterable`."""
    return not (len(iterable) == len(set(iterable)))


def _has_permutes(x, iterable):
    """Return True if `x` equates to any permutated item in `iterable`."""
    for item in iterable:
        if set(x) == set(item):
            return True
    return False


def _clean_subsets(subsets):
    """Yield quasi-unique, non-repeated subsets."""
    seen = set()
    for subset in subsets:
        if _has_permutes(subset, seen) or _has_repeats(subset):
            continue
        seen.add(subset)
        yield subset

Code

# Secondary Functions
def get_cluster(subsets, seed, n=2, exclude=True):
    """Return a set of quasi-unique tuples; permit `n` similar elements.

    `seed` is a set, a starting pool of related items.

    """
    if exclude:
        set_ = seed
    else:
        set_ = copy.copy(seed)

    for subset in subsets:
        if _is_quasi_unique(n, subset, seed):
            set_.add(subset)
    return set_


def get_clean_subsets(*iterables):
    """Yield clean subsets."""
    iterables = it.product(*iterables)
    yield from _clean_subsets(iterables)


# Primary Functions
def get_clusters(subsets, n=2, exclude=True):
    """Return a dict of clusters of unique items."""
    clusters = {}
    everseen = set()
    for subset in subsets:
        if subset not in everseen:
            clusters[subset] = get_cluster(subsets, {subset}, n=n, exclude=exclude)
            everseen |= clusters[subset]
    return clusters


def find_related_items(*iterables, n=2, exclude=True):
    """Return a dict of key-cluster pairs.

    Parameters
    ----------
    iterables : tuple
        Input lists
    n : int
        Number of allowed similar elements between subsets.
    exclude: bool
        Make clusters with subsets unique within each cluster.

    """
    subsets = list(get_clean_subsets(*iterables))
    return get_clusters(subsets, n=n, exclude=exclude)

Demo

Find sets of tuples that are unique to all other tuples (subsets)

>>> find_related_items(a, b, c, d, e, n=2, exclude=False)
{('01', '02', '04', '05', '06'): 
    {('01', '02', '03', '05', '08'),
     ('01', '02', '03', '06', '08'),
     ('01', '02', '03', '07', '06'),
     ('01', '02', '03', '07', '08'),
     ('01', '02', '04', '05', '06'),
     ('01', '02', '04', '07', '08'),
     ('01', '02', '05', '07', '08'),
     ('01', '03', '04', '05', '07'),
     ('01', '03', '04', '05', '08'),
     ('01', '03', '04', '06', '08'),
     ('01', '03', '04', '07', '06'),
     ('01', '03', '04', '07', '08'),
     ('01', '03', '05', '06', '08'),
     ('01', '03', '05', '07', '06'),
     ...},
...}

Find sets of tuples that are unique to other tuples in the each set (cluster)

>>> find_related_items(a, b, c, d, e, n=2)
{('01', '02', '03', '05', '07'): 
    {('01', '02', '03', '05', '07'),
     ('01', '02', '03', '06', '08'),
     ('01', '02', '04', '05', '08'),
     ('01', '02', '04', '07', '06'),
     ('01', '03', '04', '05', '06'),
     ('01', '03', '04', '07', '08')},
 ('01', '02', '04', '05', '06'): 
    {('01', '02', '03', '05', '08'),
     ('01', '02', '03', '07', '06'),
     ('01', '02', '04', '05', '06'),
     ('01', '02', '04', '07', '08'),
     ('01', '03', '04', '05', '07'),
     ('01', '03', '04', '06', '08')},
 ...}

Details

Terms

subsets: tuples generated from the Cartesian product of input items.
clusters: groups of quasi-unique tuples.
quasi-unique: two tuples are quasi-unique if all elements are disjoint except for n similar elements shared between them.

Primary Functions

find_related_items() : this function returns clean, key-cluster pairs. Subsets are cleaned by discaring tuples with permutated or repeated elements (via get_clean_subsets()). Clusters are built by calling get_clusters().
get_clusters(): this function produces a dictionary of key-cluster pairs by iterating the subsets. All observed subsets are recorded internally in the everseen set. Only unique observations are added as keys to the dictionary.

Two Results

These tuples in each cluster are "unique" with respect to either:

the key (A), e.g. A - B, A - C, (exclude=False)
the key and all other members in the cluster, e.g. A - B, B - C, C - A, ..., (exclude=True)

Notes

Clusters are built from _get_cluster(), which relies on an initial starting tuple. This starting tuple is needed to compared against all other subsets in order to return a cluster of quasi-unique tuples. Starting tuples are recorded as keys of the dictionary only to indicate what other tuples in a cluster were compared against. Thus, keys are arbitrary and less critical than the clusters.
While the general patterns should be consistent, the final results may be contingent on the starting tuples.

Generate unique lists from other

1 Answers1