8

In sum: I need to write a List Comprehension in which i refer to list that is being created by the List Comprehension.

This might not be something you need to do every day, but i don't think it's unusual either.

Maybe there's no answer here--still, please don't tell me i ought to use a for loop. That might be correct, but it's not helpful. The reason is the problem domain: this line of code is part of an ETL module, so performance is relevant, and so is the need to avoid creating a temporary container--hence my wish to code this step in a L/C. If a for loop would work for me here, i would just code one.

In any event, i am unable to write this particular list comprehension. The reason: the expression i need to write has this form:

[ some_function(s) for s in raw_data if s not in this_list ]

In that pseudo-code, "this_list" refers to the list created by evaluating that list comprehension. And that's why i'm stuck--because this_list isn't built until my list comprehension is evaluated, and because this list isn't yet built by the time i need to refer to it, i don't know how to refer to it.

What i have considered so far (and which might be based on one or more false assumptions, though i don't know exactly where):

  • doesn't the python interpreter have to give this list-under-construction a name? i think so

  • that temporary name is probably taken from some bound method used to build my list ('sum'?)

  • but even if i went to the trouble to find that bound method and assuming that it is indeed the temporary name used by the python interpreter to refer to the list while it is under construction, i am pretty sure you can't refer to bound methods directly; i'm not aware of such an explicit rule, but those methods (at least the few that i've actually looked at) are not valid python syntax. I'm guessing one reason why is so that we do not write them into our code.

so that's the chain of my so-called reasoning, and which has led me to conclude, or at least guess, that i have coded myself into a corner. Still i thought i ought to verify this with the Community before turning around and going a different direction.

doug
  • 69,080
  • 24
  • 165
  • 199
  • 1
    Looks like you want a `set` (judging from the pseudocode, the rest I haven't read). – kennytm Feb 20 '11 at 09:58
  • my fault--i should have moved the third-from-the-last sentence up near the front of the question. It says "But before anyone tells me that there are a thousand more simpler ways to do that in python--yes, i know. It's just an example to illustrate the problem. In my case, the actual expression is a sampling algorithm" – doug Feb 20 '11 at 10:08
  • (1) What is the result of `some_function(s)`? `s` after being fiddled with? (2) `s is not in this_list` -- each s is coming from raw_data; how could it already be in this_list??? – John Machin Feb 20 '11 at 10:16
  • 1
    @doug: `the actual expression is a sampling algorithm` care to elaborate more on this? – eat Feb 20 '11 at 10:23
  • 3
    'performance is relevant'? Why do you think a list comprehension would be any faster than writing the loop out in full: they won't be a measurable difference if you have a simple list comprehension. More complex contorted list comprehensions may actually be faster if written as clean expanded code. – Duncan Feb 20 '11 at 11:12
  • Martelli describes a way that might work in Py25 and Py26 in this duplicate http://stackoverflow.com/questions/2638478/recursive-list-comprehension-in-python – XORcist Feb 20 '11 at 12:02
  • 2
    "Look at this code, but solve the completely different problem that is only in my head" is not a good question. Just write down what you really need instead. Btw, Generator functions are just as efficient as LCs - you can't not should do everything with LCs. – Jochen Ritzel Feb 20 '11 at 14:00
  • 2
    "Performance is relevant" but you want to use an O(N^2) algorithm rather than O(N). –  Feb 20 '11 at 15:40

4 Answers4

3

I don't see why you need to do this in one go. Either iterate through the initial data first to eliminate duplicates - or, even better, convert it to a set as KennyTM suggests - then do your list comprehension.

Note that even if you could reference the "list under construction", your approach would still fail because s is not in the list anyway - the result of some_function(s) is.

Daniel Roseman
  • 588,541
  • 66
  • 880
  • 895
3

There used to be a way to do this using the undocumented fact that while the list was being built its value was stored in a local variable named _[1].__self__. However that quit working in Python 2.7 (maybe earlier, I wasn't paying close attention).

You can do what you want in a single list comprehension if you set up an external data structure first. Since all your pseudo code seemed to be doing with this_list was checking it to see if each s was already in it -- i.e. a membership test -- I've changed it into a set named seen as an optimization (checking for membership in a list can be very slow if the list is large). Here's what I mean:

raw_data = [c for c in 'abcdaebfc']

seen = set()
def some_function(s):
    seen.add(s)
    return s

print [ some_function(s) for s in raw_data if s not in seen ]
# ['a', 'b', 'c', 'd', 'e', 'f']

If you don't have access to some_function, you could put a call to it in your own wrapper function that added its return value to the seen set before returning it.

Even though it wouldn't be a list comprehension, I'd encapsulate the whole thing in a function to make reuse easier:

def some_function(s):
    # do something with or to 's'...
    return s

def add_unique(function, data):
    result = []
    seen = set(result) # init to empty set
    for s in data:
        if s not in seen:
            t = function(s)
            result.append(t)
            seen.add(t)
    return result

print add_unique(some_function, raw_data)
# ['a', 'b', 'c', 'd', 'e', 'f']

In either case, I find it odd that the list being built in your pseudo code that you want to reference isn't comprised of a subset of raw_data values, but rather the result of calling some_function on each of them -- i.e. transformed data -- which naturally makes one wonder what some_function does such that its return value might match an existing raw_data item's value.

martineau
  • 119,623
  • 25
  • 170
  • 301
2

As far as I know, there is no way to access a list comprehension as it's being built.

As KennyTM mentioned (and if the order of the entries is not relevant), then you can use a set instead. If you're on Python 2.7/3.1 and above, you even get set comprehensions:

{ some_function(s) for s in raw_data }

Otherwise, a for loop isn't that bad either (although it will scale terribly)

l = []
for s in raw_data:
    item = somefunction(s)
    if item not in l:
        l.append(item)
Tim Pietzcker
  • 328,213
  • 58
  • 503
  • 561
  • Another way of doing it in older Python's, would be to transform the list into a set: `set([ some_function(s) for s in raw_data ])` – Avi Feb 20 '11 at 10:09
  • again--my pseudo-code is just an example. I don't need the unique items, that's just an artifact of my example. The crux is the reference in the L/C to 'this_list'--that's the problem i'm trying to solve. And you're exactly right about why your answer won't work for me, and why i've avoided it--this is an ETL module, 'scale' is indeed implicated here. Clearly i want to avoid creating auxillary/tempoary containers, hence my strong wish to code an L/C. – doug Feb 20 '11 at 10:19
  • 2
    @doug: you mention scale when dissing Tim's answer but the "is s in this_list" of you preferred solution will be executed N times and will perform O(N) comparisons -- looks like O(N**2) all-up to me. How big is N? What do you use to compare two objects for equivalence? – John Machin Feb 20 '11 at 10:27
0

Why don't you simply do:[ some_function(s) for s in set(raw_data) ]

That should do what you are asking for. Except when you need to preserve the order of the previous list.

jammon
  • 3,404
  • 3
  • 20
  • 29