1

I have the following line of code:

val = tuple(s for s in name_list if str(ngram) in s)

this will search a list 'name_list' and print all elements in the list which has the substring "ngram" in it. Python seems to have this powerful 'if in' magic, and it is beautifully compact.

I'm used to seeing loops and conditionals as:

for line in file:

and

if x == y:

So can someone explain the actual structure for these one liners.

The reason I ask is because in my particular case the list 'name_list' is a list of 60K+ elements. I want to return and get the heck out of my function as soon as ten substrings are found. So more specifically:

s
for s in name_list:
    if str(ngram) in s:
        if len(s) <= 10:
             return true

it's the variable s in front that is existing alone that throws me off as to how to refer to it, and if s is simply one substring match in my case. Or a list which is appended to with each found substring then I convert to a tuple.

I'm going to need some serious psychological help here.

Doug
  • 597
  • 2
  • 7
  • 22
  • Search for "python list comprehension". Details: https://www.python.org/dev/peps/pep-0202/ And please don't ask two questions in one. –  Jun 06 '15 at 05:33
  • you could use a list to store the substring and check for the lenght of the list in the if statement – therealprashant Jun 06 '15 at 05:39

2 Answers2

3

To cut an arbitrary iterable (including a generator), use the function itertools.islice:

gen = (s for s in name_list if str(ngram) in s)  # this is a generator
val = tuple(itertools.islice(gen, 10)) # take only 10 first elements

More about generator expressions in this question.

To address your concern about reading the whole sequence, here is an example:

def gen():
    for x in xrange(1000000): # a lot
        print 'Yielding', x # to demo the side effect
        yield x

Then list(itertools.islice(gen(), 3)) will return [0, 1, 2] and print:

Yielding 1
Yielding 2
Yielding 3

And then the generator will stop, because no one is asking it to proceed. That's called lazy evaluation (btw, the article explains it exactly using the example of islice and other itertools).

Community
  • 1
  • 1
bereal
  • 32,519
  • 6
  • 58
  • 104
  • although on second thought this will still find all values, meaning it will iterate through all 60K+ elements in 'name_list' *then* it will just take the top 10 results *after* it's iterated through all elements in 'name_list.' Or it may be a simultaneous restraint on gen, but I think not. – Doug Jun 06 '15 at 17:02
  • @Doug No, it will take only as many values as it needs, that's the coolness of the generators. Give me a sec, will update the answer. – bereal Jun 06 '15 at 17:03
  • great, sorry to belabor this but it seems different to call a function (like in your 2nd example) and to use an already defined object (1st example) which has been assigned to that value.. Is this valid confusion? – Doug Jun 06 '15 at 17:19
  • 1
    That's just a different syntax. The first one is a generator expression and the function returns a generator, but they are the different ways to create the same kind of object. – bereal Jun 06 '15 at 17:23
2

List comprehensions.

I recommend reading the list comprehensions section in the official Python documentation.

Here are some examples of list comprehensions. You can use nested loops and conditionals in list comprehensions.

The following list comprehension builds a list [2, 4, 6, 8, 10] by iterating over a range 1-10; the integer yielded in each iteration is tested for divisibility by two.

[n for n in range(1, 11) if n % 2 == 0]

You can also use nested loops and supplement the element with another value based on a condition.

Anywho, this:

val = tuple(s for s in name_list if str(ngram) in s)

Is pretty much this (just using lists for simplicity):

val = []

for s in name_list:

    if str(ngram) in s:
        val.append(s)
  • @Lutz yes sir yes sir – Doug Jun 06 '15 at 05:48
  • @LillianLemmer so where would a return be appropriate here? how can i refer to the length of val unless I create another variable which will keep the count within the actual code? – Doug Jun 06 '15 at 05:50
  • @Doug read about `yield` and `iterators`, or use a comprehension, e.g., `[x for x in number_list if len(x) == 10]` – Lillian Seabreeze Jun 06 '15 at 05:53
  • @Doug there is no function here, so there's no need for `return`. – bereal Jun 06 '15 at 05:53
  • @bereal why don't we be real man, i can't iterate through that list each time it's HUGE, and i've got other work to do. so perhaps return isn't the correct philosophy, but something that says: "hey iterator, we're at 10 you stop right there young man" – Doug Jun 06 '15 at 05:56