4

My very first post and question here...

So, let list_a be the list of lists:

list_a = [[2,7,8], [3,4,2], [5,10], [4], [2,3,5]...]

Let list_b be another list of integers: list_b = [5,7]

I need to exclude all lists in list_a, whose items include at least one item from list_b. The result from example above schould look like list_c = [[3,4,2], [4]...]

If list_b was not a list but a single number b, then one could define list_c in one line as:

list_c = [x for x in list_a if not b in x]

I am wondering, if it is possible to write an elegant one-liner also for the list list_b with several values in it. Of course, I can just loop through all list_b's values, but may be there exists a faster option?

DevLounge
  • 8,313
  • 3
  • 31
  • 44
DimaWest
  • 43
  • 3
  • I can see you have various answers at bottom. But as a dev, I would like to point out something, you should try to escape as much as writing one liner coding. In the beginning its always seems like its wonderful, but when troubleshooting, it would be much more hard to find out and maintain. – Omrum Cetin Apr 12 '21 at 00:03
  • I agree with your point, but in this particular case I don't want to write one-liner for the sake of its own. I have to do this operation millions of times with lists contatinig up to 100000 lists as items, so every tiny cut in execution time helps me. – DimaWest Apr 12 '21 at 06:43

3 Answers3

4

You can write the logic all sublists in A where none of the elements of B are in the sublist with a list comprehension like:

A = [[2,7,8], [3,4,2], [5,10], [4], [2,3,5]]

B = [5,7]

[l for l in A if not any(n in l for n in B)]
# [[3, 4, 2], [4]]

The condition any(n in l for n in B) will be true if any element, n, of B is in the sublist, l, from A. Using not we can take the opposite of that.

Mark
  • 90,562
  • 7
  • 108
  • 148
4

Let's first consider the task of checking an individual element of list_a - such as [2,7,8] - because no matter what, we're conceptually doing to need a way to do that, and then we're going to apply that to the list with a list comprehension. I'll use a as the name for such a list, and b for an element of list_b.

The straightforward way to write this is using the any builtin, which works elegantly in combination with generator expressions: any(b in a for b in list_b).

The logic is simple: we create a generator expression (like a lazily-evaluated list comprehension) to represent the result of the b in a check applied to each b in list_b. We create those by replacing the [] with (); but due to a special syntax rule we may drop these when using it as the sole argument to a function. Then any does exactly what it sounds like: it checks (with early bail-out) whether any of the elements in the iterable (which includes generator expressions) is truthy.


However, we can likely do better by taking advantage of set intersection. The key insight is that the test we are trying to do is symmetric; considering the test between a and list_b (and coming up with another name for elements of a), we could equally have written any(x in list_b for x in a), except that it's harder to understand that.

Now, it doesn't help to make a set from a, because we have to iterate over a anyway in order to do that. (The generator expression does that implicitly; in used for list membership requires iteration.) However, if we make a set from list_b, then we can do that once, ahead of time, and just have any(x in set_b for x in a).

But that, in turn, is a) as described above, hard to understand; and b) overlooking the built-in machinery of sets. The operator & normally used for set intersection requires a set on both sides, but the named method .intersection does not. Thus, set_b.intersection(a) does the trick.


Putting it all together, we get:

set_b = set(list_b)
list_c = [a for a in list_a if not set_b.intersection(a)]
Karl Knechtel
  • 62,466
  • 11
  • 102
  • 153
  • upvote for also going with set intersection + making me realise that I can pass a list to set.intersection instead of a set as I always assumed until now ;-) – DevLounge Apr 12 '21 at 00:03
  • 1
    Thank you for the explanation and solution. I have checked regarding run time following three options: 1) "Any"-Solution of Mark M: [l for l in A if not any(n in l for n in B)] 2) Consequitive loop through list_B's values 3) Set_b.Intersection-Solution of Karl Knechtel It turned out that the run times of option 1) and 2) were comparable while third option was the fastest. In my test examples Set_B_intersection method was about 5 times faster! – DimaWest Apr 12 '21 at 08:16
2

Mark's answer is good but hard to read.

FYI, you can also leverage sets:

>>> set_b = set(list_b)
>>> [l for l in list_a if not set_b.intersection(l)]
[[3, 4, 2], [4]]
DevLounge
  • 8,313
  • 3
  • 31
  • 44