0

I'm new to Python, and am attempting to create a frequency analysis program, which is a program that tells you how often letters occur in a text. I made each letter in the alphabet a variable, all within a function named 'freq_analysis' that uses the parameter 'text', so they could all be printed and calculated separately, and each variable looked like this:

    a = text.count('a' and 'A')

Each variable was identical to that. Other than the letter being different, that is. I made sure that there were no backticks or spaces mixed with tabs on each line. I then created separate variables where the percentages (how often each letter occured in the text) were calculated, which looked like this:

  aa = (a / (len(text) - blank)) * 100

Again, I checked that each variable was identical. The problem is that when I try to print each percentage, the only percentage that's calculated and printed is that of the letter T. This doesn't even include lowercase t, it's only the capital T. I tested this by calling on the function and running the text 'tttTtt' through it. The percentage 16.66 was returned, rather than 100, which is what it should have been. I'm printing each letter using this code:

    print (aa,'%')

If I wasn't clear enough, I can provide more information, because I have no idea as to what the problem is.

Bella E.
  • 1
  • 1

3 Answers3

0

The reason you see this happening is that ('a' and 'A') doesn't behave as you expect:

>>> ('a' and 'A')
'A'
>>> 'abfaAHA'.count('a' and 'A')
2

and evaluates the first parameter a and if that is truthy then it automatically returns the second one. Since all non empty strings are truthy A is returned and used by count. In order to count all occurrences of a and A either check them separately or convert the string to lowercase and check for a:

>>> s = 'abfaAHA'
>>> s.count('a') + s.count('A')
4
>>> s.lower().count('a')
4

Since you want to know the count for all letters using Counter would make the task a lot easier. It creates a dictionary object from the string where keys are letters and values are counts:

>>> from collections import Counter
>>> Counter(s.lower())
Counter({'a': 4, 'h': 1, 'b': 1, 'f': 1})
niemmi
  • 17,113
  • 7
  • 35
  • 42
0

I think what you're looking for in the first example is:

a = text.count('a') + text.count('A')

Explanation: the expression text.count('a' and 'A') always resolves to text.count('A'), because the 'a' and 'A' is 'A'. The and operator tries both operands for a truth value and returns the first non-truthy value, or the last value if they are both truthy.

Te-jé Rodgers
  • 878
  • 1
  • 7
  • 20
0

Others have explained the problem with using text.count('a' and 'A').

Since you want to frequencies to be case insensitive you can convert the incoming text to lowercase first (or uppercase if you prefer), and then perform your counts. At the top of your function convert to lowercase:

def f(text):
    text = text.lower()
    a = text.count('a')
    b = text.count('b')
    ...

However, this is not ideal. You will end up with 26 variables which will very quickly become unwieldy. You are better off using a dictionary to maintain counts of each character, or you could shortcut that and go straight to a collections.Counter object to do it all for you:

from collection import Counter

text = 'Hi there!'
counts = Counter(text.lower())
print(counts)
# Counter({'e': 2, 'h': 2, '!': 1, ' ': 1, 'i': 1, 'r': 1, 't': 1})

Now you have the counts of each an every character in counts. It presents an interface like that of a dictionary, so:

>>> counts['a']
0
>>> counts['e']
2

Because you want to ignore non alphabetic characters you can filter out these unwanted characters using string.ascii_lowercase:

text = 'Hi there!'
counts = Counter(c for c in text.lower() if c in string.ascii_lowercase)

Now there is no '!' in the counts.

To calculate the percentages:

n_letters = len(list(counts.elements()))
for c in sorted(counts):
    print('{}: {:.2f}'.format(c, counts[c] / n_letters * 100))

e: 28.57
h: 28.57
i: 14.29
r: 14.29
t: 14.29

If you want all letters:

n_letters = len(list(counts.elements()))
for c in string.ascii_lowercase:
    if c in counts:
        frequency = counts[c] / n_letters * 100
    else:
        frequency = 0
    print('{}: {:.2f}'.format(c, frequency))
mhawke
  • 84,695
  • 9
  • 117
  • 138