I have made the following code and basically it outputs how often all characters showed up in a file named 'Test'.
from os import strerror
from collections import Counter
try:
with open ('Test', 'rt') as handle:
content = handle.read().lower().replace(' ', '').replace('\n', '')
counts = Counter(content)
for i in sorted(counts, key=lambda x: counts[x], reverse=True)[:30]:
print('{} -> {}'.format(i, counts[i]))
except IOError as e:
print('I/O error occurred: ', strerror(e.errno))
The output is:
e -> 383
o -> 247
s -> 226
t -> 224
n -> 219
a -> 217
r -> 201
i -> 188
d -> 127
h -> 125
l -> 112
c -> 112
m -> 105
u -> 72
f -> 59
p -> 59
g -> 58
y -> 48
b -> 47
. -> 36
w -> 35
, -> 35
v -> 28
k -> 25
0 -> 15
- -> 9
% -> 8
1 -> 7
’ -> 7
x -> 7
Afterward I realized I just need the alphabets. I figured I have to modify line #6:
content = handle.read().lower().replace(' ', '').replace('\n', '')
I am aware I could just create a for-loop and using following conditional expresstion: str.isalpha()
to remove non-alphabetic.
I wonder if there's other better ways to do that.
Thank you in advance for your feedback:-)