1

Say I have the following list of list of names:

names = [['Matt', 'Matt', 'Paul'], ['Matt']]

I want to return only the "Matts" in the list, but I also want to maintain the list of list structure. So I want to return:

[['Matt', 'Matt'], ['Matt']]

I've something like this, but this will append everthting together in one big list:

matts = [name for namelist in names for name in namelist if name=="Matt"]

I know something like this is possible, but I want to avoid iterating through lists and appending. Is this possible?

names = [['Matt', 'Matt', 'Paul'], ['Matt']]
matts = []
for namelist in names:
    matts_namelist = []
    for name in namelist:
        if name=="Matt":
            matts_namelist.append(name)
        else:
            pass
    matts.append(matts_namelist)
        
  • Is this a 2-dimensional list, or an n-dimensional list? Could you have `[['Matt', ['Matt', 'Paul', ['Paul']], Paul, Matt, Matt], 'Matt', 'Paul']`? – Joshua Voskamp Nov 02 '21 at 14:44

4 Answers4

7

Use a nested list comprehension, as below:

names = [['Matt', 'Matt', 'Paul'], ['Matt']]
res = [[name for name in lst if name == "Matt"] for lst in names]
print(res)

Output

[['Matt', 'Matt'], ['Matt']]

The above nested list comprehension is equivalent to the following for-loop:

res = []
for lst in names:
    res.append([name for name in lst if name == "Matt"])
print(res)

A third alternative functional alternative using filter and partial, is to do:

from operator import eq
from functools import partial

names = [['Matt', 'Matt', 'Paul'], ['Matt']]

eq_matt = partial(eq, "Matt")
res = [[*filter(eq_matt, lst)] for lst in names]
print(res)

Micro-Benchmark

%timeit [[*filter(eq_matt, lst)] for lst in names]
56.3 µs ± 519 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit [[name for name in lst if "Matt" == name] for lst in names]
26.9 µs ± 355 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Setup (of micro-benchmarks)

import random
population = ["Matt", "James", "William", "Charles", "Paul", "John"]
names = [random.choices(population, k=10) for _ in range(50)]

Full Benchmark

Candidates

def nested_list_comprehension(names, needle="Matt"):
    return [[name for name in lst if needle == name] for lst in names]


def functional_approach(names, needle="Matt"):
    eq_matt = partial(eq, needle)
    return [[*filter(eq_matt, lst)] for lst in names]


def count_approach(names, needle="Matt"):
    return [[needle] * name.count(needle) for name in names]

Plot Plot of alternative solutions

The above results were obtained for a list that varies from 100 to 1000 elements where each element is a list of 10 strings chosen at random from a population of 14 strings (names). The code for reproducing the results can be found here. As it can be seen from the plot the most performant solution is the one from @rv.kvetch.

Dani Mesejo
  • 61,499
  • 6
  • 49
  • 76
2

An alternate way using list.count:

>>> names = [['Matt', 'Matt', 'Paul'], [], ['Matt']]
>>> [name.count('Matt') * ['Matt'] for name in names]
[['Matt', 'Matt'], [], ['Matt']]

You could also try with itertools.repeat:

>>> import itertools
>>> [[*itertools.repeat('Matt', name.count('Matt'))] for name in names]
[['Matt', 'Matt'], [], ['Matt']]

Lastly, as suggested by @DaniMensejo, you could also use the range iterator within a nested list comprehension:

>>> [['Matt' for _ in range(name.count('Matt'))] for name in names]
[['Matt', 'Matt'], [], ['Matt']]
rv.kvetch
  • 9,940
  • 3
  • 24
  • 53
  • I also had the same idea :) But when used with mutable data structure changing one element will be reflected in other elements too(https://stackoverflow.com/q/240178/12416453) – Ch3steR Nov 02 '21 at 15:00
  • Wow, lot of good points here, will certainly update the answer to mention those cases. – rv.kvetch Nov 02 '21 at 15:05
  • @DaniMesejo Agreed. Interestingly, I used your benchmarking setup, `count*["Matt"]` solution is 50% faster might be because it doesn't have to create extra objects(same object `"Matt"` is each sublist). – Ch3steR Nov 02 '21 at 15:07
  • 1
    Actually, I did try with `matts[0][0] = 'John'`, and interestingly it only affected the first nested element. – rv.kvetch Nov 02 '21 at 15:09
  • 1
    @rv.kvetch `str` is immutable so it creates a new string object. That weird effect can be seen when used with mutable data structures like lists, sets, dicts – Ch3steR Nov 02 '21 at 15:11
  • 1
    @rv.kvetch You'd have to use some mutating function on the object to observe it. For example, `a = [[0]]*2; a[0][0] = 10` would change the underlying sublists. Had it been a list instead if "Matt" you can observe the change. – Ch3steR Nov 02 '21 at 15:15
  • @DaniMesejo Reason may not be due to copy. At least in CPython interpreter uses the same str object everywhere(https://stackoverflow.com/questions/60429672/two-different-string-object-with-same-value-in-python). Now, I guess it may be due to the fact that `list * num` is implemented in C maybe. – Ch3steR Nov 02 '21 at 15:20
  • @Ch3steR ah, good point. I looked closer at the linked post, and it looks like that mutating effect is because the outermost list was also being repeated. Since it's only the inner list that's being repeated here, I daresay we should be safe from any mutative effects. – rv.kvetch Nov 02 '21 at 15:25
1

IIUC, you can do this with a nested list like below:

>>> names = [['Matt', 'Matt', 'Paul'], ['Matt']]
>>> [[name for name in lst_name if name=='Matt'] for lst_name in names]
[['Matt', 'Matt'], ['Matt']]

I'mahdi
  • 23,382
  • 5
  • 22
  • 30
0

Use the filter function -

matts = [list(filter(lambda x: x=='Matt', namelist)) for namelist in names]
Mortz
  • 4,654
  • 1
  • 19
  • 35