How to count number of words with given length in a dictionary comprehension

Question

So i am trying to find the occurrences of all the words in a list using dictionary comprehension using the length of the word as the key, and then the occurrence of the length of the word as the value.

def words_lengths_map(text):
    mod_text = ["hello", "this", "is", "a", "list", "of","words"]
    dict1 = {len(k): k.count(k) for k in mod_text}
    print(dict1)

This produces the correct key but my value is always 1. My expected output should be:

{5:2, 4:2, 2:2, 1:1}

``k.count(k)`` counts how often a word occurs in itself. How do you expect it to occur more than once in itself? — MisterMiyagi, Nov 22 '21 at 12:37
Seeing how there are several words of length 5, 4 and 2 in your list, what result do you expect for these length collisions? — MisterMiyagi, Nov 22 '21 at 12:38
Aah.. gothca now (I think). You want to count the *length* of each word, and count how many words have a given length? — CutePoison, Nov 22 '21 at 12:41

norok2 · Answer 1 · 2021-12-02T00:00:06.833

You cannot do this efficiently with a comprehension as you would need a reference to the dict meanwhile is being created, and this is not currently possible in Python. Instead, you could update the counter dict inside a plain loop where you increment the value of the counter if the key is present in the dict, otherwise you set it to one:

def count_words_by_length(words):
    counter = {}
    for word in words:
        n = len(word)
        if n in counter:
            counter[n] += 1
        else:
            counter[n] = 1
    return counter


mod_text = ["hello", "this", "is", "a", "list", "of","words"]
print(count_words_by_length(mod_text))
# {5: 2, 4: 2, 2: 2, 1: 1}

If you really want to use a dict comprehension, here are a couple of less efficient approaches:

Counting the number of words of given length for each word. This is the least efficient, but the closest to your original approach. Every time a word with a given length is found, the counting is reset even if the dict already knew about that length.

def count_words_by_length_compr1(words):
    return {
        len(word): sum(1 for word_ in words if len(word_) == len(word)
        for word in words}


mod_text = ["hello", "this", "is", "a", "list", "of","words"]
print(count_words_by_length_compr1(mod_text))
# {5: 2, 4: 2, 2: 2, 1: 1}

Counting the number of words for all length between the minimum length and the maximum length, discarding entries with 0 counts. This may be more or less efficient than the above depending on the actual lengths of the words.

def count_words_by_length_compr2(words):
    return {
        n: sum(1 for word in words if len(word) == n)
        for n in range(len(min(words)), len(max(words)) + 1)
        if any(len(word) == n for word in words)}


mod_text = ["hello", "this", "is", "a", "list", "of","words"]
print(count_words_by_length_compr2(mod_text))
# {1: 1, 2: 2, 4: 2, 5: 2}

Same as above but with more efficient discarding (using the walruss operator, available since Python 3.8).

def count_words_by_length_compr3(words):
    return {
        n: k
        for n in range(len(min(words)), len(max(words)) + 1)
        if (k := sum(1 for word in words if len(word) == n)) > 0}


mod_text = ["hello", "this", "is", "a", "list", "of","words"]
print(count_words_by_length_compr3(mod_text))
# {1: 1, 2: 2, 4: 2, 5: 2}

Counting the number of words for each available length (pre-computed and stored in a set). This is a bit more time efficient since the outer loop is run for exactly as many times as needed (contrary to all previous comprehension-based solutions), at the expenses of some more memory consumption.

def count_words_by_length_compr4(words):
    return {
        n: sum(1 for word in words if len(word) == n)
        for n in {len(word) for word in words}}


mod_text = ["hello", "this", "is", "a", "list", "of","words"]
print(count_words_by_length_compr4(mod_text))
# {1: 1, 2: 2, 4: 2, 5: 2}

CutePoison · Answer 2 · 2021-11-22T12:42:20.420

-1

I might misunderstand your question, but you can count each occurence using the Counter class in the collections module here

e.g

from collections import Counter
mod_text = ["hello", "this", "is", "a", "list", "of","words"]
lengths = [len(p) for p in mod_text] #Get the length of each word
c = Counter(lengths) 
print(c)

#Counter({5: 2, 4: 2, 2: 2, 1: 1})

edited Nov 22 '21 at 12:42

answered Nov 22 '21 at 12:37

CutePoison

4,679
5
28
63

Note that the question code doesn't use words as keys. – MisterMiyagi Nov 22 '21 at 12:39
I might misunderstand the question then – CutePoison Nov 22 '21 at 12:40
1

Counter is not module. Its a function inside module collections – Shekhar Samanta Nov 22 '21 at 12:41
1

It's actually a dictionary-subclass to be completely correct – CutePoison Nov 22 '21 at 12:44

Haukland · Answer 3 · 2021-11-25T08:58:46.463

-1

You need to count k in mod_text, not in itself (which would always yield 1).

def words_lengths_map(text):
    mod_text = text.split()
    #['this','this','is', 'is', 'a']
    dict1 = {len(k): mod_text.count(k) for k in mod_text}
    print(dict1)
    #{4: 2, 2: 2, 1: 1}

words_lengths_map("this this is is a")

Edit: As norok2 pointed out in a comment below, this will find the number of occurrences of a given word, not a given word length.

edited Nov 25 '21 at 08:58

answered Nov 22 '21 at 12:42

Haukland

677
8
25

This would not work because count will count occurrences of that specific word and not the words of given length. Even if you were to define a correct count function, this would not be very efficient, as you would have n² time complexity, while there are simpler approaches with n complexity (n being the size of the input list). – norok2 Nov 25 '21 at 01:06

How to count number of words with given length in a dictionary comprehension

3 Answers3