You cannot do this efficiently with a comprehension as you would need a reference to the dict
meanwhile is being created, and this is not currently possible in Python. Instead, you could update the counter dict inside a plain loop where you increment the value of the counter if the key is present in the dict, otherwise you set it to one:
def count_words_by_length(words):
counter = {}
for word in words:
n = len(word)
if n in counter:
counter[n] += 1
else:
counter[n] = 1
return counter
mod_text = ["hello", "this", "is", "a", "list", "of","words"]
print(count_words_by_length(mod_text))
# {5: 2, 4: 2, 2: 2, 1: 1}
If you really want to use a dict comprehension, here are a couple of less efficient approaches:
- Counting the number of words of given length for each word. This is the least efficient, but the closest to your original approach. Every time a word with a given length is found, the counting is reset even if the
dict
already knew about that length.
def count_words_by_length_compr1(words):
return {
len(word): sum(1 for word_ in words if len(word_) == len(word)
for word in words}
mod_text = ["hello", "this", "is", "a", "list", "of","words"]
print(count_words_by_length_compr1(mod_text))
# {5: 2, 4: 2, 2: 2, 1: 1}
- Counting the number of words for all length between the minimum length and the maximum length, discarding entries with 0 counts. This may be more or less efficient than the above depending on the actual lengths of the words.
def count_words_by_length_compr2(words):
return {
n: sum(1 for word in words if len(word) == n)
for n in range(len(min(words)), len(max(words)) + 1)
if any(len(word) == n for word in words)}
mod_text = ["hello", "this", "is", "a", "list", "of","words"]
print(count_words_by_length_compr2(mod_text))
# {1: 1, 2: 2, 4: 2, 5: 2}
- Same as above but with more efficient discarding (using the walruss operator, available since Python 3.8).
def count_words_by_length_compr3(words):
return {
n: k
for n in range(len(min(words)), len(max(words)) + 1)
if (k := sum(1 for word in words if len(word) == n)) > 0}
mod_text = ["hello", "this", "is", "a", "list", "of","words"]
print(count_words_by_length_compr3(mod_text))
# {1: 1, 2: 2, 4: 2, 5: 2}
- Counting the number of words for each available length (pre-computed and stored in a
set
). This is a bit more time efficient since the outer loop is run for exactly as many times as needed (contrary to all previous comprehension-based solutions), at the expenses of some more memory consumption.
def count_words_by_length_compr4(words):
return {
n: sum(1 for word in words if len(word) == n)
for n in {len(word) for word in words}}
mod_text = ["hello", "this", "is", "a", "list", "of","words"]
print(count_words_by_length_compr4(mod_text))
# {1: 1, 2: 2, 4: 2, 5: 2}