3

The following procedural code snippet computes the character frequency of a text string and writes inside a dictionary. The dictionary has the characters as keys and the frequency as values.

text = "asampletextstring"
char_count = {}
for char in text:
    if char_count.get(char):
        char_count[char] += 1
    else:
        char_count[char] = 1

My question is, is it possible to rewrite the above snippet as a comprehension?

Stefan
  • 1,151
  • 1
  • 8
  • 13
  • Why comprehension? – mad_ Oct 31 '18 at 18:55
  • 1
    Not comprehension, but shorter: `char_count = Counter(text)` – dvnguyen Oct 31 '18 at 18:57
  • @mad_ I wanted to write the snippet more concisely and also learn the `pythonic`way of doing things. @dvnguyen Yes i know about Counter, but I thought there might be an approach without importing anything... – Stefan Oct 31 '18 at 19:17
  • `dicts`and `lists` are also pythonic. Comprehension may reduce some lines of code but may not be as performant each time – mad_ Oct 31 '18 at 19:24

4 Answers4

4

It is possible, but is inefficient:

text = "asampletextstring"

char_count = { char : text.count(char) for char in text }

print(char_count)

Output

{'s': 2, 'x': 1, 'p': 1, 'm': 1, 'e': 2, 'r': 1, 'n': 1, 'g': 1, 'a': 2, 'i': 1, 'l': 1, 't': 3}

You could write a shorter version of your code:

char_count = {}
for char in text:
    char_count[char] = char_count.get(char, 0) + 1
Dani Mesejo
  • 61,499
  • 6
  • 49
  • 76
  • counter is inefficient for the short string, given by the topic starter, but great for long text (say 10 000) see my discussions in measurement topic – Serge Oct 31 '18 at 21:09
  • provided one iterate over set(text) as suggested above – Serge Oct 31 '18 at 21:15
1

Could use set() here to avoid encountering the character 2 or more times.

text = "asampletextstring"
dict1 = {ch: text.count(ch) for ch in set(text)}

print(dict1)
{'s': 2, 'r': 1, 'i': 1, 'n': 1, 'a': 2, 'e': 2, 'p': 1, 't': 3, 'x': 1, 'l': 1, 'g': 1, 'm': 1}
Chris Charley
  • 6,403
  • 2
  • 24
  • 26
  • Counter is pretty efficient for long string and alphabets, accoriding to measurements in https://stackoverflow.com/questions/41594940/why-is-collections-counter-so-slow – Serge Oct 31 '18 at 20:29
1

Was curious to look into the performance of various approaches and to prove comprehensions are not good each time I did some analysis using dictionary comprehensions, dictionary comprehension by converting input to sets and traditional for loops. It makes sense why comprehension is expensive here as .count() is iterating over the entire text each time to count the frequency of single char

from timeit import timeit

print('Approach 1 without set compehrension: {}'.format(timeit ('{ch: text.count(ch) for ch in text}',setup='text = "asampletextstring"',number=1000000)))
print('Approach 2 with set compehrension: {}'.format(timeit ('{ch: text.count(ch) for ch in set(text)}',setup='text = "asampletextstring"',number=1000000)))
print('Approach 3 simple loops :{}'.format(timeit('for c in text:char_count[c] = char_count.get(c, 0) + 1',setup='text = "asampletextstring";char_count={};',number=1000000)))
print('Approach 4 Counter :{}'.format(timeit('Counter(text)',setup='text = "asampletextstring";from collections import Counter;',number=1000000)))

Output:

Approach 1 without set compehrension: 4.43441867505
Approach 2 with set compehrension: 3.98101747791
Approach 3 simple loops :2.60219633984
Approach 4 Counter :7.54261124884
mad_
  • 8,121
  • 2
  • 25
  • 40
  • Counter is pretty efficient for long string and alphabets, accoriding to measurements in https://stackoverflow.com/questions/41594940/why-is-collections-counter-so-slow – Serge Oct 31 '18 at 20:29
  • with setup='text = "asampletextstring" * 1000;from collections import Counter;',number=100))), the time is – Serge Oct 31 '18 at 20:51
  • Approach 1 without set, compehrension: 16.287531096022576 Approach 2 with set and compehrension: 0.02602811809629202 Approach 3 simple loops :0.19724294892512262 Approach 4 Counter :0.05493389000184834 – Serge Oct 31 '18 at 20:51
  • I don't understand your point here. My point was to compare the given text and not big and small sequence. And time value may differ depending on the operating system and processing power but relative time would still matter. Let me know if I have missed something obvious – mad_ Oct 31 '18 at 20:55
  • I was curious to look into the performance of various approaches and to prove that naive loop is not always as good as designated standard tools or modules. – Serge Oct 31 '18 at 20:59
  • I guess for small text the performance does not really matter, while for long one it might be more important. Also there is a non-negligable possiblity that topic starter provided short string just following the rules that require a 'minimal' example – Serge Oct 31 '18 at 21:01
  • And of course while exact timing differes on different machines, result shows that for long string count and Counter overperform a naive loop ( though count without set is not good idea) – Serge Oct 31 '18 at 21:03
  • you did a typo in "asampletextstring" * 1000, instead you used "asampletextstring * 1000" – Serge Nov 01 '18 at 15:21
0

Rewrite - not really, I do not see any easy way. The best I arrived requires an additional dictionary.

d = {}
{ c: d.get(c, 0)  for c in text if d.update( {c: d.get(c,0) + 1} ) or True}

One will be able to get one-liner in Python 3.8, but by (ab)using assignment expressions

Serge
  • 3,387
  • 3
  • 16
  • 34