Getting all unqiue strings from a list of nested list and tuples

Question

Is there a fast way to get the unique elements, especially the strings from a list or tuple of nested lists and tuples. Strings like 'min' and 'max' should be removed. The lists and tuples could be nested in any possible way. The only element which will always be the same are the tuples at the core like ('a',0,49), which contains the strings.

Like those list or tuple:

lst1=[[(('a',0,49),('b',0,70)),(('c',0,49))],
     [(('c',0,49),('e',0,70)),(('a',0,'max'),('b',0,100))]]

tuple1=([(('a',0,49),('b',0,70)),(('c',0,49))],
     [(('c',0,49),('e',0,70)),(('a',0,'max'),('b',0,100))])

Wanted Output:

uniquestrings = ['a','b','c','e']

What I tried so far:

flat_list = list(sum([item for sublist in x for item in sublist],()))

But this does not go to the "core" of the nested object

Go through each element in the list, get unique elements from it (like here - https://stackoverflow.com/questions/7961363/removing-duplicates-in-lists, or there are much more other links) -> store those to new list -> remove duplicates from this new list — kosist, Nov 01 '18 at 08:21
@AnagnostouJohn No could be any possible nested list or tuple, the only elements which alwyas keep the same shape is the core tuple like ('a',0,49) — Varlor, Nov 01 '18 at 08:37
why/how should the ("a",0,"max") not be a valid tuple? do you seek to only get those that are in both inner lists? please explain more carefully what you want and how you decide if a tuple is "worth" being put into your result.... — Patrick Artner, Nov 01 '18 at 08:54
No they are worth. I just saying that those tuples contains the strings I need and that they alswyas have the same shape — Varlor, Nov 01 '18 at 08:55
@Varlor you can filter flatten list for strings like this `lst1 = [x for x in lst1 if isinstance(x, str)]` — Rezvanov Maxim, Nov 01 '18 at 09:04

score 2 · Answer 1 · answered Nov 01 '18 at 08:31

# generative flatten algorithm
def flatten(lst):
    for x in lst:
        if isinstance(x, (list, tuple,)):
            for x in flatten(x):
                yield x
        else:
            yield x

# source list (or tuple)
lst1 = [[(('a', 0, 49), ('b', 0, 70)), (('c', 0, 49))],
        [(('c', 0, 49), ('e', 0, 70)), (('a', 0, 'max'), ('b', 0, 100))]]

# getting elements
lst1 = list(flatten(lst1))[::3]
# >>> ['a', 'b', 'c', 'c', 'e', 'a', 'b']

# delete non-unique elements and sorting result list
lst1 = sorted(list(set(lst1)))
# >>> ['a', 'b', 'c', 'e']

Patrick Artner · Accepted Answer · 2018-11-01T09:49:43.910

2

This will get any string inside the given iterable, regardless of position inside the iterable:

def isIterable(obj):
    # cudos: https://stackoverflow.com/a/1952481/7505395
    try:
        _ = iter(obj)
        return True
    except:
        return False

# shortcut
isString = lambda x: isinstance(x,str)

def chainme(iterab):
    # strings are iterable too, so skip those from chaining
    if isIterable(iterab) and not isString(iterab):
        for a in iterab:
            yield from chainme(a)
    else: 
        yield iterab

lst1=[[(('a',0,49),('b',0,70)),(('c',0,49))],
     [(('c',0,49),('e',0,70)),(('a',0,'max'),('b',0,100))]]

tuple1=([(('a',0,49),('b',0,70)),(('c',0,49))],
     [(('c',0,49),('e',0,70)),(('a',0,'max'),('b',0,100))]) 


for k in [lst1,tuple1]:
    # use only strings
    l = [x for x in chainme(k) if isString(x)]
    print(l)
    print(sorted(set(l)))
    print()

Output:

['a', 'b', 'c', 'c', 'e', 'a', 'max', 'b'] # list
['a', 'b', 'c', 'e', 'max']                # sorted set of list

['a', 'b', 'c', 'c', 'e', 'a', 'max', 'b']
['a', 'b', 'c', 'e', 'max']

edited Nov 01 '18 at 09:49

answered Nov 01 '18 at 08:45

Patrick Artner

50,409
9
43
69

But what is the difference to Rezvanov Maxim method? The "isinstance" method says false for a string – Varlor Nov 02 '18 at 09:29
1

@Valor the difference is visible in the output. Rezvanov is (after chaining all elements) only using every 3rd element (hardcoded by list slicing: `lst1 = list(flatten(lst1))[::3]` ( the [::3] part ) - meaning he looks only at the bold elements: **'a'**,0,49,**'b'**,0,70,**'c'**,0,49,**'c'**, 0, 49,**'e'**, 0, 70, **'a'**, 0, 'max',**'b'**, 0, 100 - the rest is sliced away and not considered. He gets all strings at 0th position on each inner 3 tuple. I get all strings regardless where in the tuple they are [continued] – Patrick Artner Nov 02 '18 at 10:52
1

[continued] which I think is closer to your OPs title **Getting all unqiue strings from a list of nested list and tuples** - SO strains to build a database for users in 3,6,9 monts that search for something and happen to find your question and can use the given answers to solve theire actual problem. – Patrick Artner Nov 02 '18 at 10:54

score 1 · Answer 3 · edited Nov 01 '18 at 09:52

1

import collections

def flatten(l):
    for el in l:
        if isinstance(el, collections.Iterable) and not isinstance(el, (str, bytes)):
            yield from flatten(el)
        else:
            yield el

[x for x in set(list(flatten(lst1))) if str(x).isalpha() if str(x) != "max" and "min"]

You can use the codes to flatten as defined here: Flatten an irregular list of lists

edited Nov 01 '18 at 09:52

Patrick Artner

50,409
9
43
69

answered Nov 01 '18 at 08:46

Cua

129
9

Getting all unqiue strings from a list of nested list and tuples

3 Answers3