-1

I want to search a text and count how many times selected words occur. For simplicity, I'll say the text is "Does it fit?" and the words I want to count are "it" and "fit".

I've written the following code:

mystring = 'Does it fit?'
search_words = 'it', 'fit'
for sw in search_words:
    frequency = {}
    count = mystring.count(sw.strip())
    output = (sw + ',{}'.format(count))
    print(output)

The output is

it,2
fit,1

because the code counts the 'it' in 'fit' towards the total for 'it'.

The output I want is

it,1
fit,1

I've tried changing line 5 to count = mystring.count('\\b'+sw+'\\b'.strip()) but the count is then zero for each word. How can I get this to work?

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
Belmonte
  • 45
  • 1
  • 6
  • 2
    Does this answer your question? [Finding occurrences of a word in a string in python 3](https://stackoverflow.com/questions/17268958/finding-occurrences-of-a-word-in-a-string-in-python-3) – Random Davis Oct 02 '20 at 15:26

3 Answers3

1

that list syntax is off, heres a way to do it though

bad_chars = [';', ':', '!', "*","?","."]
res = {}
for word in ["it","fit"]: 
    res[word] = 0
    string = ''.join((filter(lambda i: i not in bad_chars, "does it fit?")))
    for i in string.split(" "):
        if word == i: res[word] += 1

print(res)

by using the in keyword you were checking if that string was in another string, in this case it was inside fit, so you were getting 2 occurrences of it

here it directly compares the words after removing punctuation/special characters!

output:

{'it': 1, 'fit': 1}
Ironkey
  • 2,568
  • 1
  • 8
  • 30
1

The issue with the regex pattern that you have tried implementing in your original post is with str.count() rather than the pattern itself.

str.count() (docs) returns the count of non-overlapping occurrences of the str passed as a parameter within the str that the method is applied to - so 'lots of love'.('lo') will return 2 - however, str.count() is for substring identification using string literals only and will not work with regular expression patterns.

The below solution using your original pattern and the built in re module should work nicely for you.

import re

mystring = 'Does it fit?'
search_words = 'it', 'fit'

results = dict()

for sw in search_words:
    count = re.findall(rf'\b{sw}\b', mystring)
    results[sw] = 0 if not count else len(count)

for k, v in results.items():
    print(f'{k}, {v}')

If you want to get matches from search_words regardless of their case - e.g for each occurrence of the substrings 'Fit', 'FIT', 'fIt' etc. present in mystring to be included in the count stored in results['fit'] - you can achieve this by changing the line:

    count = re.findall(rf'\b{sw}\b', mystring)

to

    count = re.findall(rf'\b{sw}\b', mystring, re.IGNORECASE)
JPI93
  • 1,507
  • 5
  • 10
  • Thanks, but this is giving me an error message: `for k, v in results: ValueError: too many values to unpack (expected 2)` – Belmonte Oct 02 '20 at 16:21
  • @Belmonte Whoops, sorry about that. Fixed in update to my answer. Penultimate line should be `for k, v in results.items():` – JPI93 Oct 02 '20 at 16:25
-1

Try this:

def count_words(string, *args):
    words = string.split()
    search_words = args
    frequency_dict = {}
    for i in range(len(words)):
        if words[i][-1] == '?':
            words[i] = words[i][:-1]
    for word in search_words:
        frequency_dict[word] = words.count(word)
    for word, count in frequency_dict.items():
        print(f'{word}, {count}')

You can do,

count_words('Does it it it fit fit it?', 'it', 'fit')

And the output is,

it, 4
fit, 2