2

if flattend is just a list of strings, for example

['There','is','only','passion','and','piece','is','a','lie','lie','lie']

then in following two lines

c = Counter(flattened)
vocab = [x for x, count in c.items() if count>=2]

what does the part [x for x,...] mean? also, shouldn't count be of type tuple as i suppose it is a counter item? how come this part count>=2 work?!

Note: I understand from debugging that the first line converts the list into a counter and the second one removes the items that occurred less than twice. but i can't really interpret the syntax

ShadowRanger
  • 143,180
  • 12
  • 188
  • 271
kk96kk
  • 55
  • 1
  • 5
  • 1
    I feel like this is just largely just asking for an [Explanation of List Comprehensions](https://stackoverflow.com/questions/19559625/explanation-of-list-comprehensions) and [Tuple unpacking in for loops](https://stackoverflow.com/q/10867882/364696) . Not sure if there is a more complete single duplicate though. – ShadowRanger Oct 02 '18 at 18:54

4 Answers4

2

So the syntax here is a little confusing, but what's actually happening is that each item in c.items() is a tuple containing a word and its count.

A more clear way of writing this would be:

vocab = [x for (x, count) in c.items() if x>=2]

but it could be also be done like this:

vocab = [x[0] for x in c.items() if x[1]>=2]

where x is a tuple.

It can also be helpful to look at what c actually looks like. If you print c, you see:

>>> print c
Counter({'lie': 3, 'is': 2, 'and': 1, 'a': 1, 'There': 1, 'only': 1, 'passion': 1, 'piece': 1})

and c.items()

>>> print c.items()
[('and', 1), ('a', 1), ('lie', 3), ('is', 2), ('There', 1), ('only', 1), ('passion', 1), ('piece', 1)]
wpercy
  • 9,636
  • 4
  • 33
  • 45
  • 1
    In other words, this is equivalent of `[x for (x, count)...]`. – bereal Oct 02 '18 at 18:37
  • yeah, added the more clear unpacking syntax for clarity – wpercy Oct 02 '18 at 18:37
  • @bereal: It's not just equivalent, it's identical; the defining attribute of all non-empty `tuple`s is the presence of a comma, not the parentheses; parentheses are only needed when there would be ambiguity (e.g. with function call parentheses) or precedence issues (including `tuple`s inside other collection literals). When it's not really about making `tuple`s (e.g. multiple returns, unpacking values), it's considered more Pythonic to *not* include the parentheses when they're not necessary. – ShadowRanger Oct 02 '18 at 18:44
  • @ShadowRanger in certain way, the words "equivalent" and "identical" are equivalent (or identical). – bereal Oct 02 '18 at 18:49
2

Counter will return a dictionary like structure. So you need to iterate over keys and values, key is x and value is count. If we look closely at c.items()

c.items() #list of tuples with (key,value)

[('and', 1),
 ('a', 1),
 ('lie', 3),
 ('is', 2), # x->'is' ,count->2
 ('There', 1),
 ('only', 1),
 ('passion', 1),
 ('piece', 1)]

So if you are iterating this list for a single tuple there are two components: a word and associated count. For count you are checking if the count>=2 if yes then returning that key which in list comphrension is x

mad_
  • 8,121
  • 2
  • 25
  • 40
  • The display you provide is only the case on Python 2; `Counter` isn't just "dictionary-like", it's a `dict` subclass, and on Python 3, that means `.items()` returns a `dict` view (a live view of the underlying `dict` with fixed overhead no matter the underlying `dict`'s size, which changes with the `dict`), not a `list` (which on Py2 is an eager copy of the key/value pairs and doesn't subsequently change with the `dict` is was generated from). As it happens, they both iterate the same (assuming the `dict` isn't mutated mid-iteration), you just won't see a `list` without wrapping in `list()`. – ShadowRanger Oct 02 '18 at 18:47
0

[x for x, ...] is just using x as an variable while iterating over some array...

x, count captures the two items that serve as iterated values from c.items().

If you were to print the results of: for _ in c.items(): print(_) That would print out a list of tuples like (x, count).

[x for x, count in c.items() if count > 2] just preserves x in the array while using the count iterable as a filter.

Quentin
  • 700
  • 4
  • 10
0

Let's break it down into lines:

vocab = [           # line0
         x          # line1
         for        # line2
         x, count   # line3
         in
         c.items()
         if
         count>=2]  # line7

Each tuple from c.items() is composed of a key, x, (the thing that was counted) and a count (the number of times that key was seen).

On each loop, you can imagine the next tuple is pulled, then unpacked, so that instead of needing to use a single value with indices 0 and 1, you can just refer to them by name; anontuple[0] becomes x, anontuple[1] becomes count.

The count>=2 line then filters the results; if count is less than 2, we stop processing this item, and pull the next one.

The plain x on the far left is the item to produce; when the filtering check is passed, we shove the corresponding x into the resulting list unmodified.

Converting to a regular loop, it would look like this (lines matched to listcomp lines):

vocab = []                  # line0
for x, count in c.items():  # lines 2-5
    if count >= 2:          # lines 6-7
        vocab.append(x)     # line1

If unpacking is confusing to you, you could instead imagine it as:

vocab = []              # line0
for item in c.items():  # lines 2, 4 and 5
    x = item[0]         # line3
    count = item[1]     # line3
    if count >= 2:      # line 6-7
        vocab.append(x) # line1
ShadowRanger
  • 143,180
  • 12
  • 188
  • 271