0

This is an exercise on Kaggle/Python/Strings and Dictionaries. I wasn't able to solve it so I peeked at the solution and tried to write it in a way I would do it (i.e. not necessarily as sophisticated but in a way I understood). I use Python tutor to visualise what's going on behind the code and understand most things but the for-loop is getting me.

normalised = (token.strip(",.").lower() for token in tokens) This works and gives me index [0]

but if I rewrite as:

for token in tokens:
    normalised = token.strip(",.").lower()

it doesn't work; it gives me index [0][2] (presumably because casino is in casinoville). Can someone write the multi-line equivalent: for token in tokens:...?


code is below for a bit more context.

def word_search(doc_list, keyword):
Takes a list of documents (each document is a string) and a keyword. 
Returns list of the index values into the original list for all documents 
containing the keyword.

Example:
doc_list = ["The Learn Python Challenge Casino.", "They bought a car", "Casinoville"]
>>> word_search(doc_list, 'casino')
>>> [0]
"""

indices = []
counter = 0
for doc in doc_list:
    tokens = doc.split()
    **normalised = (token.strip(",.").lower() for token in tokens)**
    if keyword.lower() in normalised:
            indices.append(counter)
    counter += 1
return indices

#Test - output should be [0]
doc_list = ["The Learn Python Challenge Casino.", "They bought a car", "Casinoville"]
keyword = 'Casino'
print(word_search(doc_list,keyword))
Joanna
  • 1
  • 1
  • Why do you need a for loop version? Your first version seems very pythonic. – quamrana Feb 24 '21 at 07:46
  • Wait, does that really work? – user202729 Feb 24 '21 at 07:51
  • Lookup generator expression, and try to run the second code on pencil and paper to see what's its behavior. – user202729 Feb 24 '21 at 07:52
  • the first version is copied from the solution but I wanted to find a way to rewrite: ```normalised = (token.strip(",.").lower() for token in tokens)``` or ```normalised = [token.strip(",.").lower() for token in tokens]``` the answer is correct when using () [] but I wanted to write multiline equivalent: i.e. ```for token in tokens: normalised = ..........``` – Joanna Feb 24 '21 at 08:28

1 Answers1

1

normalised = (token.strip(",.").lower() for token in tokens) returns a tuple generator. Let's explore this:

>>> a = [1,2,3]
>>> [x**2 for x in a]
[1, 4, 9]

This is a list comprehension. The multi-line equivalent is:

>>> a = [1,2,3]
>>> b = []
>>> for x in a:
...     b.append(x**2)
...
>>> print(b)
[1, 4, 9]

Using parentheses instead of square brackets does not return a tuple (as one might suspect naively, as I did earlier), but a generator:

>>> a = [1,2,3]
>>> (x**2 for x in a)
<generator object <genexpr> at 0x0000024BD6E33B48>

We can iterate over this object with next:

>>> a = [1,2,3]
>>> b = (x**2 for x in a)
>>> next(b)
1
>>> next(b)
4
>>> next(b)
9
>>> next(b)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration

This can be written as a multi-line expression like this:

>>> a = [1,2,3]
>>> def my_iterator(x):
...     for k in x:
...             yield k**2
...
>>> b = my_iterator(a)
>>> next(b)
1
>>> next(b)
4
>>> next(b)
9
>>> next(b)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration

In the original example, an in comparison is used. This works for both the list and the generator, but for the generator it only works once:

>>> a = [1,2,3]
>>> b = [x**2 for x in a]
>>> 9 in b
True
>>> 5 in b
False
>>> b = (x**2 for x in a)
>>> 9 in b
True
>>> 9 in b
False

Here is a discussion of the issue with generator reset: Resetting generator object in Python

I hope that clarified the differences between list comprehensions, generators and multi-line loops.

mad
  • 320
  • 1
  • 11
  • 1
    `normalised` is not a tuple, it is a `generator` object, quite a big difference. when you write `tuple comprehension` it returns a `generator`. – alexzander Feb 24 '21 at 08:20
  • Could you show these in multi-line - for token in tokens:? ```normalised = (token.strip(",.").lower() for token in tokens)``` and ```**normalised = [token.strip(",.").lower() for token in tokens]**``` – Joanna Feb 24 '21 at 08:32
  • Interesting, I was not aware of that. Thanks for clarifying! – mad Feb 24 '21 at 08:36
  • Thank you! It comes up correct when I replace with ```normalized = [] for token in tokens: normalized.append(token.strip('.,').lower())```. Don't think I've covered generators yet, but this will be a good reference point for me. – Joanna Feb 26 '21 at 19:05