Delete same rows from list of lists in python

Question

I have a list of lists in python. I would like to remove the rows what first element is already in the block.

block = [
    ['alfa', 'T31360N', '2013-12-19 12:07:2'],
    ['beta', 'D41535N', '2013-12-19 12:20:1'],
    ['gamma', 'E61460N', '2013-12-19 13:58:2'],
    ['delta', 'D133PR01', '2013-12-19 14:19:4'],
    ['beta', 'Q3332N', '2013-12-19 14:19:5']
]

How can I delete the 'beta' started rows from the list?

Do you mean a list of lists? There is no such thing as a *block* in Python. — Martijn Pieters, Mar 19 '14 at 16:55
Did you want to remove *both* the lists that start with `'beta'`? Or just the second one? — Martijn Pieters, Mar 19 '14 at 16:59
If it is not a problem I would like to see both solutions :-). Mainly I would like to delete only the second one. — GergA, Mar 19 '14 at 17:00

score 3 · Accepted Answer · edited May 23 '17 at 11:49

Adapting How do you remove duplicates from a list in whilst preserving order? to your list:

seen = set()
block = [row for row in block if row[0] not in seen and not seen.add(row[0])]

This rebuilds block to only contain rows that have a unique first element; so only the first row with a given first value is kept.

To keep just the unique rows and remove all rows that have more than one entry, you need to use a collections.Counter() object to track how many times each first element is present, then trim block:

from collections import Counter

counts = Counter(row[0] for row in block)
block = [row for row in block if counts[row[0]] == 1]

Demo:

>>> from pprint import pprint
>>> from collections import Counter
>>> block = [
...     ['alfa', 'T31360N', '2013-12-19 12:07:2'],
...     ['beta', 'D41535N', '2013-12-19 12:20:1'],
...     ['gamma', 'E61460N', '2013-12-19 13:58:2'],
...     ['delta', 'D133PR01', '2013-12-19 14:19:4'],
...     ['beta', 'Q3332N', '2013-12-19 14:19:5']
... ]
>>> seen = set()
>>> pprint([row for row in block if row[0] not in seen and not seen.add(row[0])])
[['alfa', 'T31360N', '2013-12-19 12:07:2'],
 ['beta', 'D41535N', '2013-12-19 12:20:1'],
 ['gamma', 'E61460N', '2013-12-19 13:58:2'],
 ['delta', 'D133PR01', '2013-12-19 14:19:4']]
>>> counts = Counter(row[0] for row in block)
>>> pprint([row for row in block if counts[row[0]] == 1])
[['alfa', 'T31360N', '2013-12-19 12:07:2'],
 ['gamma', 'E61460N', '2013-12-19 13:58:2'],
 ['delta', 'D133PR01', '2013-12-19 14:19:4']]

Is there any advantage to doing `seen = set()` rather than `seen = []` and changing `seen.add()` to `seen.append()`? — Tom Fenech, Mar 19 '14 at 17:07
@TomFenech: No, because `set` membership testing is much faster. — Martijn Pieters, Mar 19 '14 at 17:10
Thanks. That should have been phrased better. I was assuming that you'd picked the better way, I was just wondering why! — Tom Fenech, Mar 19 '14 at 17:14

Delete same rows from list of lists in python

1 Answers1