47

I have a function to pick out lumps from a list of strings and return them as another list:

def filterPick(lines,regex):
    result = []
    for l in lines:
        match = re.search(regex,l)
        if match:
            result += [match.group(1)]
    return result

Is there a way to reformulate this as a list comprehension? Obviously it's fairly clear as is; just curious.


Thanks to those who contributed, special mention for @Alex. Here's a condensed version of what I ended up with; the regex match method is passed to filterPick as a "pre-hoisted" parameter:

import re

def filterPick(list,filter):
    return [ ( l, m.group(1) ) for l in list for m in (filter(l),) if m]

theList = ["foo", "bar", "baz", "qurx", "bother"]
searchRegex = re.compile('(a|r$)').search
x = filterPick(theList,searchRegex)

>> [('bar', 'a'), ('baz', 'a'), ('bother', 'r')]
Brent.Longborough
  • 9,567
  • 10
  • 42
  • 62

5 Answers5

79
[m.group(1) for l in lines for m in [regex.search(l)] if m]

The "trick" is the for m in [regex.search(l)] part -- that's how you "assign" a value that you need to use more than once, within a list comprehension -- add just such a clause, where the object "iterates" over a single-item list containing the one value you want to "assign" to it. Some consider this stylistically dubious, but I find it practical sometimes.

Alex Martelli
  • 854,459
  • 170
  • 1,222
  • 1,395
  • 1
    Alex, I like that; thanks and +1. I have some fairly heavy lifting to do with this code - should I worry about the extra overhead of setting-up and tearing-down the "faux iterator"? BTW I subscribe to the doctrine of "optimise later". – Brent.Longborough Mar 13 '10 at 00:12
  • 1
    @Brent, the "faux iterator" should be negligible wrt the search call; one minor optimization is to use `(regex.search(l),)` in lieu of `[regex.search(l)]` (which I find more readable but is minutely slower -- I thought you couldn't possibly be in a hurry as you were actually calling the `re.search` function from the module rather than the re object's method. Pulling `regex.search` as a bound method outside of the listcomp is another minor but useful optimization, btw. – Alex Martelli Mar 13 '10 at 01:15
  • as soon as I saw your answer I realised that using re.search was not the best way to go. Could you clarify for me how you would "[pull the] regex.search as a bound method outside of the listcomp"? I really appreciate your patience with a listcomp and Python noob. – Brent.Longborough Mar 13 '10 at 10:08
  • 1
    @Brent, `src=regex.search; lst=[m.group(1) for l in lines for m in [src(l)] if m]` is the "bound method hoisting" optimization (does the method lookup once instead of redoing it for each line -- Python doesn't hoist attribute lookups for you, when you need such optimization you can however do it manually, as I just showed). – Alex Martelli Mar 13 '10 at 14:47
  • 6
    @AlexMartelli, I find a nested comprehension more readable than the "double for hack": `search = re.compile('...').search; out = [m.group(1) for m in map(search, lines) if m]` You could use nested brackets, but in this case a map() is just as readable, since the transformer is a simple callable, and it's actually faster! (33% faster than for hack with a tuple, **40% faster than for hack with a list**, and 15% faster than a nested list comprehension using brackets.) Measured using a moderately complex regexp `'(a.*b.*c)'`, which is O(n²), on a huge list of file names. – Tobia Feb 21 '13 at 18:43
  • @AlexMartelli - this is a good candidate to rewrite using ':=' operator - it is 2021! – PaulMcG Jul 09 '21 at 21:58
12
return [m.group(1) for m in (re.search(regex, l) for l in lines) if m]
Ignacio Vazquez-Abrams
  • 776,304
  • 153
  • 1,341
  • 1,358
7

It could be shortened a little

def filterPick(lines, regex):
    matches = map(re.compile(regex).match, lines)
    return [m.group(1) for m in matches if m]

You could put it all in one line, but that would mean you would have to match every line twice which would be a bit less efficient.

Wolph
  • 78,177
  • 11
  • 137
  • 148
5

Starting Python 3.8, and the introduction of assignment expressions (PEP 572) (:= operator), it's possible to use a local variable within a list comprehension in order to avoid calling multiple times the same expression:

# items = ["foo", "bar", "baz", "qurx", "bother"]
[(x, match.group(1)) for x in items if (match := re.compile('(a|r$)').search(x))]
# [('bar', 'a'), ('baz', 'a'), ('bother', 'r')]

This:

  • Names the evaluation of re.compile('(a|r$)').search(x) as a variable match (which is either None or a Match object)
  • Uses this match named expression in place (either None or a Match) to filter out non matching elements
  • And re-uses match in the mapped value by extracting the first group (match.group(1)).
Xavier Guihot
  • 54,987
  • 21
  • 291
  • 190
-15
>>> "a" in "a visit to the dentist" 
True 
>>> "a" not in "a visit to the dentist" 
False

That also works with a search query you're hunting down in a list

`P='a', 'b', 'c'

'b' in P` returns true

waldi
  • 1
  • 2
  • 1
    How does that answer the question? – Oren S Nov 17 '12 at 19:20
  • This questions may present a better way to check for inputs into an list than re, but by the way don't work if you wan't to grep results. U can always do a simple loop for around the re output. Is not much difference to do it manually than use a function that does the same... – m3nda May 17 '16 at 16:41