2

I want to write a function that extracts elements from deep nested tuples and lists, say I have something like this

l = ('THIS', [('THAT', ['a', 'b']), 'c', ('THAT', ['d', 'e', 'f'])])

And I want a flat list without 'THIS' and 'THAT':

list = ['a', 'b', 'c', 'd', 'e', 'f']

Here's what I have so far:

def extract(List):
    global terms
    terms = []
    for i in word:
        if type(i) is not str:
            extract(i)
        else:
            if i is not "THIS" and i is not "THAT":
                terms.append(i)
    return terms

But I keep getting list = ['d', 'e', 'f'], it looks like the terms = [] is set again after looping to 'c'.

wim
  • 338,267
  • 99
  • 616
  • 750
efsee
  • 579
  • 1
  • 10
  • 22
  • Just out of curiosity, are you really looking to eliminate this and that specifically, or the first element of every nested tuple in general if it's a string? – Mad Physicist Mar 13 '18 at 04:11

4 Answers4

6

You're doing terms = [] at the top of the function, so of course every time you recursively call the function, you're doing that terms=[] again.

The quickest solution is to write a simple wrapper:

def _extract(List):
    global terms
    for i in word:
        if type(i) is not str:
            _extract(i)
        else:
            if i is not "THIS" and i is not "THAT":
                terms.append(i)
    return terms

def extract(List):
    global terms
    terms = []
    return _extract(List)

One more thing: You shouldn't use is to test for string equality (except in very, very special cases). That tests that they're the same string object in memory. It will happen to work here, at least in CPython (because both "THIS" strings are constants in the same module—and even if they weren't, they'd get interned)—but that's not something you want to rely on. Use ==, which tests that they both mean the same string, whether or not they're actually the identical object.

Testing types for identity is useful a little more often, but still not usually what you want. In fact, you usually don't even want to test types for equality. You don't often have subclasses of str—but if you did, you'd probably want to treat them as str (since that's the whole point of subtyping). And this is even more important for types that you do subclass from more often.

If you don't completely understand all of that, the simple guideline is to just never use is unless you know you have a good reason to.

So, change this:

if i is not "THIS" and i is not "THAT":

… to this:

if i != "THIS" and i != "THAT":

Or, maybe even better (definitely better if you had, say, four strings to check instead of two), use a set membership test instead of anding together multiple tests:

if i not in {"THIS", "THAT"}:

And likewise, change this:

if type(i) is not str:

… to this:

if not isinstance(i, str):

But while we're being all functional here, why not use a closure to eliminate the global?

def extract(List)
    terms = []
    def _extract(List):
        nonlocal terms
        for i in word:
            if not isinstance(i, str):
                _extract(i)
            else:
                if i not in {"THIS", "THAT"}:
                    terms.append(i)
        return terms
    return _extract(List)

This isn't the way I'd solve this problem (wim's answer is probably what I'd do if given this spec and told to solve it with recursion), but this has the virtue of preserving the spirit of (and most of the implementation of) your existing design.

abarnert
  • 354,177
  • 51
  • 601
  • 671
  • 1
    Would you mind share what you would solve it, I'm willing to learn any simpler solutions, thanks. – efsee Mar 13 '18 at 03:47
  • 2
    @efsee See [wim's answer](https://stackoverflow.com/a/49247980/908494), which is pretty much what I'd do if given this spec and asked to solve this recursively. – abarnert Mar 13 '18 at 04:06
  • @DSM I left that alone from the OP's original code. But you're right, it probably is worth explaining why it's a bad idea. Edited. – abarnert Mar 14 '18 at 18:29
  • Much obliged. :-) – DSM Mar 14 '18 at 18:31
  • @DSM While fixing that, I noticed he was also testing for type equality. – abarnert Mar 14 '18 at 18:33
4

It will be good to separate the concerns of "flattening" and "filtering". Decoupled code is easier to write and easier to test. So let's first write a "flattener" using recursion:

from collections import Iterable

def flatten(collection):
    for x in collection:
        if isinstance(x, Iterable) and not isinstance(x, str):
            yield from flatten(x)
        else:
            yield x

Then extract and blacklist:

def extract(data, exclude=()):
    yield from (x for x in flatten(data) if x not in exclude)

L = ('THIS', [('THAT', ['a', 'b']), 'c', ('THAT', ['d', 'e', 'f'])])
print(*extract(L, exclude={'THIS', 'THAT'}))
wim
  • 338,267
  • 99
  • 616
  • 750
  • 2
    I suppose someone might not like that you didn't explain what was wrong with the OP's code, and instead ignored it and wrote something very different. Personally, I like it, and I think the OP can probably understand it and learn from it (maybe after a bit of reading up on generators or asking for help), so I upvoted. – abarnert Mar 13 '18 at 04:08
0

Assuming that the first element of each tuple can be disregarded, and we should recurse with list that is the second element, we can do this:

def extract(node):
    if isinstance(node, tuple):
        return extract(node[1])
    if isinstance(node, list):
        return [item for sublist in [extract(elem) for elem in node] for item in sublist]
    return node

The list comprehension is a little dense, here's the same with loops:

def extract(node):
    if isinstance(node, tuple):
        return extract(node[1])
    if isinstance(node, list):
        result = []
        for item in node:
            for sublist in extract(item):
                for elem in sublist:
                    result.append(elem)
        return result
    return node
Mark Rogaski
  • 111
  • 1
  • 2
-1

This iterative function should do the trick alongside the .extend() list operator.

def func(lst):
    new_lst = []
    for i in lst:
        if i != 'THAT' and i != 'THIS':
            if type(i) == list or type(i) == tuple: 
                new_lst.extend(func(i))
            else: new_lst.append(i)
    return new_lst

l = ('THIS', [('THAT', ['a', 'b']), 'c', ('THAT', ['dk', 'e', 'f'])])
print(func(l))

['a', 'b', 'c', 'dk', 'e', 'f']

JahKnows
  • 2,618
  • 3
  • 22
  • 37
  • This will "expand" all strings with length > 1. – wim Mar 13 '18 at 03:45
  • Edited and it works now. Just changed the second .extend to .append. – JahKnows Mar 13 '18 at 03:51
  • 2
    Don't use `is` for equality checks. This solution relies on implementation detail of string interning. – wim Mar 13 '18 at 03:55
  • "Don't use `is` for equality checks." Why? – JahKnows Mar 13 '18 at 03:56
  • 4
    @JahKnows Because `i is not 'THAT'` can be true even when `i == 'THAT'`. In this case, in CPython, it _won't_ be true, but, as wim says, that's only because of the implementation details of string interning (and/or module constants). If you call some C API function that builds `"THAT"` differently, it won't have the same id as the `'THAT"` constant in your function. If you carefully explained those implementation details, and had a good reason for doing something unusual, then it _might_ be ok, but in general, don't do it. – abarnert Mar 13 '18 at 04:10
  • Good to know. Changed it to use logical `==`. I don't get why my solution, elegant and simple to read, despite some really arbitrary caveats, WORKS. Yet, has -2. – JahKnows Mar 13 '18 at 04:14
  • I'm not sure either. This is not as general as wim's, but it's actually a lot simpler in some ways. Definitely a +1 – Mad Physicist Mar 13 '18 at 04:17
  • Well, if it didn't work for quite some time, and had some major caveats, the people who saw and downvoted that version of the answer are unlikely to come back and see the new version and change their vote. – abarnert Mar 13 '18 at 04:43
  • @abarnert, what were the "major" caveats. The use of `is in` rather than `==` when it works in Python. Or was it a use case that was outside the scope of the OP's question? Seems to me it's just overly competitive people downvoting answers such that theirs gets seen. My answer worked from the start. I know wim downvoted. I think that's a low blow on his part. – JahKnows Mar 13 '18 at 05:06
  • 2
    @JahKnows `is` instead of `==` does not "work in Python". It works in some cases, and not others—and different cases in different Python implementations. And again, there are almost always early driveby downvotes for answers with problems (and even more when those problems are pointed out by a respected user like wim), and people almost never retract them—probably because they never revisit the question again after their initial votes. It's not worth getting worked up over. If I can't guess what a downvote is for, I ask in hopes of improving my answer, but I don't expect to get points back. – abarnert Mar 13 '18 at 05:11
  • @abarnert, I appreciate the feedback to make my code better and I did implement them. They are really good pointers. However, the code worked from the start given the OP's requirements. – JahKnows Mar 13 '18 at 05:14
  • It looks better now. The `if type ...` stuff could probably be improved to use `isinstance(i, (list, tuple))` so that it doesn't randomly break with inheritance (e.g. `namedtuple`). – wim Mar 13 '18 at 05:48